• Tools and Resources
  • Customer Services
  • Communication and Culture
  • Communication and Social Change
  • Communication and Technology
  • Communication Theory
  • Critical/Cultural Studies
  • Gender (Gay, Lesbian, Bisexual and Transgender Studies)
  • Health and Risk Communication
  • Intergroup Communication
  • International/Global Communication
  • Interpersonal Communication
  • Journalism Studies

Language and Social Interaction

  • Mass Communication
  • Media and Communication Policy
  • Organizational Communication
  • Political Communication
  • Rhetorical Theory
  • Share This Facebook LinkedIn Twitter

Article contents

Language and culture.

  • Ee Lin Lee Ee Lin Lee Department of Communication Studies, Western Washington University
  • https://doi.org/10.1093/acrefore/9780190228613.013.26
  • Published online: 07 July 2016

Language is an arbitrary and conventional symbolic resource situated within a cultural system. While it marks speakers’ different assumptions and worldviews, it also creates much tension in communication. Therefore, scholars have long sought to understand the role of language in human communication. Communication researchers, as well as those from other disciplines (e.g., linguistics, anthropology, psychology, and sociology), draw on each other’s works to study language and culture. The interdisciplinary nature of the works results in the use of various research methods and theoretical frameworks. Therefore, the main goal of this essay is to sketch the history and evolution of the study of language and culture in the communication discipline in the United States.

Due to space constraints only select works, particularly those that are considered landmarks in the field, are highlighted here. The fundamentals of language and the development of the Sapir–Whorf hypothesis in leading to the formation of the language and social interaction (LSI) discipline are briefly described. The main areas of LSI study—namely language pragmatics, conversation analysis, discourse analysis, and the ethnography of communication—are summarized. Particular attention is paid to several influential theories and analytical frameworks: the speech act theory, Grice’s maxims of implicatures, politeness theory, discursive psychology, critical discourse analysis, the ethnography of speaking, speech codes theory, and cultural discourse analysis. Criticisms and debates about the trends and directions of the scholarship are also examined.

  • conversation analysis
  • discourse analysis
  • ethnography of speaking

The Fundamentals of Language

A major task of language researchers is to understand the complexities entailed in the structures of talk in order to unfold and understand sociality, including human nature, cultural values, power structure, social inequality, and so on. Researchers in language, culture, and communication study language situated in cultural nuances in order to understand language use in enhancing intergroup and intercultural dialogue. Although language enables learning and bonding, it also confuses interlocutors with contradictory yet deep and rich multi-layered meanings, such as (mis)interpretation of intentions, violation of normative conduct, and repair of conversations that have gone awry.

In a way, language not only construes our perception, but also constructs our social reality by manifesting actual social consequences. For example, the word race represents something that does not exist in physical reality, but it has real implications and consequences (e.g., discrimination, social disparity, unequal access to healthcare, etc.). Here, language allows the creation of actual and persistent perceptions (e.g., bad, inferior, non-deserving, and so on) that determine aspects of people’s lives. In fact, the role of language in influencing interlocutors’ perception and communication remains one of the most popular opening lines in empirical studies focusing on language and culture.

How Language Shapes Perception

Known as linguistic relativity, the notion that language influences our thinking about social issues derives from Edward Sapir’s works in anthropology and linguistics in the 1920s (Mandelbaum, 1963 ). Sapir studied the lexical dissections and categorization and grammatical features from the corpora obtained during his fieldwork over several decades. While studying the languages of different North American Indian tribes, including those living in Washington and Oregon in the U.S. and Vancouver in Canada, Sapir found, for example, that the Hopi language did not have lexical equivalents for the English words time, past , or the future . Therefore, he suggested that the Hopi worldview about temporal communication was different from the English worldview. In his lectures Sapir promoted the understanding of language as a system embedded in culture. Thereafter, based on Sapir’s findings, researchers studying language inferred that if there was no word for, say, you in a certain language, then speakers of that language treat you as nonexistent.

Benjamin Lee Whorf, a student of Sapir’s, later suggested that language could, to some extent, determine the nature of our thinking. Known as the Sapir–Whorf hypothesis, or linguistic determinism, the notion that language is a shaper of ideas or thought inspired further empirical testing (Whorf, 1952 ). This led some researchers to conclude that speakers of different languages (e.g., Polish, Chinese, Japanese, English, etc.) see their realities differently. The investigation of the effects of languages on human behaviors, as influenced by Sapir’s and Whorf’s works, continues to be a popular topic in various academic disciplines.

During its postwar rebuilding efforts overseas in the 1930s, the U.S. government recruited linguists and anthropologists to train its personnel at the Foreign Service Institute (FSI). While linguists researching the micro-level elements of languages successfully taught FSI officers how to speak different languages, anthropologists studying the macro-level components of culture (e.g., economy, government, religious, family practices, etc.) taught the officers how to communicate effectively with people from different cultures (Leeds-Hurwitz, 1990 ). The research and training collaboration between linguists, anthropologists, psychologists, and sociologists at FSI showed that the learning of a foreign culture was not merely about acquiring language skills or translating from one language to another, but a holistic understanding of language in a wider context.

While the teaching of foreign languages to FSI officers was efficient, teaching anthropological understanding of foreign cultures was more challenging. Moreover, during the 1940s the Sapir–Whorf hypothesis and the notion that language frames people’s worldview were contested in empirical findings. About the same time, Edward Twitchell Hall, who is credited with founding the field of intercultural communication, strongly promoted his belief that effective communication between two people from different cultural backgrounds (i.e., intercultural communication) should combine verbal (i.e., speech) and nonverbal (i.e., non-linguistic) communication embedded in a cultural context (Hall, 1966 ).

Citing efficiency, researchers at the time developed language translation programs that enabled the quick learning of intercultural communication. In this approach of linguistic universalism, researchers assumed structural equivalence across languages—that word-by-word translation can foster cultural understanding (Chomsky, 1972 ). This shift of direction in academic research challenged Sapir’s proposition of the understanding of culture and communication based on common conceptual systems—the notion that meanings and values of concepts cannot be truly understood without understanding the cultural system.

Regardless of the competing viewpoints, research on how speakers of different languages operate under different language and communication systems continues to date. Researchers have also widened the scope of the language and culture program to include the study of language use and functions (i.e., communicative purposes) in and across different cultural systems. Although the translation of the linguistic corpora into the English language is commonly featured in proprietary research publications, analyzing discourse data in the native languages is preferred. Language is therefore treated as intact with the cultural system. This line of study, despite differences in methodological and theoretical frameworks, forms the basis for a specific discipline within the communication field called language and social interaction (LSI).

The LSI discipline focuses on the study of human discourse and human interaction in situatedness. Scholars pursuing this line of research seek to understand the development of speech and language processes in various settings, from small group to interpersonal, including face-to-face and those mediated by technology (see International Communication Association [ICA] and National Communication Association websites, respectively). The scholarship employs qualitative and quantitative methods and includes verbal (i.e., speech) and nonverbal communication (i.e., nonlinguistic cues) (see the ICA website ). The various methodological and theoretical frameworks used include social psychology, ethnography of speaking, discourse analysis, conversation analysis, and narrative analysis. Although well-established and housed in the communication field, works in LSI are interdisciplinary.

While LSI studies also include nonverbal communication as a language system, scholarship on speech—whether naturally occurring, elicited, mediated, or written—outnumber those focusing on nonverbal communication. The paucity of nonverbal scholarship in the LSI discipline underscores the challenges of recording nonverbal communication for data analysis (Fitch & Sanders, 2005 ). Although studies pertaining to how social life is lived in situated conversation and language is used in various interactional settings dominate LSI research discourse, the study of nonverbal communication as language deserves its own coverage as a (sub)discipline. Consequently, this essay focuses on the scholarship on speech in LSI. The following sections review a selection of the LSI subdisciplines organized by research methods, or more commonly conceptualized as analytical frameworks and procedures: language pragmatics, conversation analysis, discourse analysis, and the ethnography of communication. The review highlights a few major theories or theoretical frameworks in each subdiscipline, namely the speech act theory, Grice’s maxims of implicatures, politeness theory, discursive psychology, critical discourse analysis, the ethnography of speaking, speech codes theory, and cultural discourse analysis.

Language Pragmatics

Pragmatics is the study of language usage or talk in interaction. Researchers who study language pragmatics investigate the meanings of utterances in relation to speech situations in the specific contexts of use. Two theoretical frameworks that are commonly cited in language pragmatics are the speech act theory and Grice’s maxims of conversational implicatures, from which the influential politeness theory derives. These theoretical frameworks emerged from the examination of language independently from context, including situational factors that influence the cultural assumptions of the speaker and hearer.

Speech Act Theory

In an attempt to understand utterances in interaction, Austin ( 1962 ) explained speech acts as communicative acts in which speakers perform actions via utterances in specific contexts. Called performatives , these are illocutionary acts in which the speaker asserts a demand through utterances. Illocutionary acts contain force— that is, they allow the speaker to perform an act without necessary naming the act (e.g., apology, question, offer, refuse, thank, etc.). Austin illustrated three types of force: (a) locution , the words in the utterances; (b) illocution , the intention of the speaker; and (c) perlocution , the consequential effects of the utterance upon the thoughts, feelings, or actions on the hearer.

The speaker’s illocutionary act is said to be happy when the hearer understands the locution and illocutionary forces. In order for the speaker’s illocutionary act to be happy, the utterance has to fulfill felicity conditions. Felicitous illocutionary acts are those that meet social and cultural criteria and bring about effects on the hearer that the speaker intended (Searle, 1969 ). Thus, illocutionary acts are conventionalized messages, because their performance is an engagement in rule-governed behavior (also see Goffman, 1967 ).

Searle extended Austin’s concept of speech acts and elaborated on the speech act theory by identifying the conditions necessary for the realization of speech acts. For example, to promise, the speaker needs sincerity and intentionality; to declare the marital union of two partners, a priest or a judge has to be present. Hence the successful performance of a speech act depends on whether the constituent conditions of a particular speech act are fulfilled, or a particular speech act is realized in a contextually appropriate manner (i.e., in relation to sociocultural factors).

Searle developed a typology to categorize speech acts: (a) representatives , where the speaker says how something is, like asserting; (b) directives , the speaker tries to get the hearer to perform some future action, such as requesting and warning; (c) commissives , the speaker commits to some future course of action, such as pledging and promising; (d) expressives , the speaker articulates his or her psychological state of mind about some prior action, such as apologizing and thanking; and (e) declaratives , performatives that require non-linguist institutions, such as christening or sentencing. These conditions must be fulfilled for the speaker to effect the specific act.

The speech act theory can be used to describe utterance sequences—for example, to predict antecedents and consequents in a conversation. Thus, when a violation of the typology occurred, speech act theory successfully predicted repairs and other signs of troubles in the conversational moves. However, Searle’s taxonomy was criticized for several reasons. First, while Searle treated illocutionary acts as consisting of complete sentences in grammatical form, such acts can be very short utterances that do not follow the complete object-verb-subject structure (e.g., “Forge on!”). On the other hand, the speaker may need to utter several sentences to bring about effects on the hearer (e.g., advising). Second, Searle assumed that the felicity conditions for successful performances are universal, but later studies found that the conditions are indeed specific to the culture.

Furthermore, Searle subscribed to a linear, speaker-to-hearer view of transaction that dismissed the interactional aspect of language. The hearer’s role was minimized; specifically, the hearer’s influence on the speaker’s construction of utterances was ignored. Searle also neglected perlocutionary acts, which focus on the intention of the speaker. Instead, he focused solely on the linguistic goal of deliberate expression of an intentional state while overlooking extralinguistic cues. In short, the speech act theory could not account for intentionality and variability in discourse.

Grice’s Maxims of Implicatures

By moving beyond the linear (i.e., speaker-to-hearer) view of transaction, Grice proposed the cooperative principle ( 1989 ). He observed that interlocutors engage in collaborative efforts in social interaction in order to attain a common goal. In Grice’s view, collaborative efforts do not mean agreement; they mean that the speaker and the hearer work together in the conversation. According to the principle, participants follow four conversational maxims: quantity (be informative), quality (be truthful), relation (be relevant), and manner (be clear, be brief). Since these four maxims vary by culture, the interlocutors need to have culturally nuanced knowledge to fulfill these maxims.

According to Grice, meaning is produced in a direct way when participants adhere to the maxims. When the speaker’s intentions are conveyed clearly, the hearer should not have to interpret the speaker’s intentions. This occurs with conventional implicatures where standard word meanings are used in the interaction. However, in actual social interaction, most meanings are implied through conversational implicatures in which one or more of the conversational maxims are violated. Due to normative constraints, a speaker who says p implicates q , and the hearer would then need to infer the implied meanings; for example, what is being said and what is beyond words in a recommendation letter.

In short, Grice’s maxims of conversational implicatures are used to explain why people engage in different interpretations rather than rely on the literal meanings of utterances. The maxims attend to implied meanings that constitute a huge part of conversation and also the role of the hearer. Nonetheless, the cooperative principle was criticized for privileging the conversational conventions of middle-class English speakers. Additionally, Grice did not scrutinize strategic non-cooperation, which remains a primary source of inference in conversation (Hadi, 2013 ).

Politeness Theory

Influenced by Grice’s maxims, Brown and Levinson ( 1987 ) proposed the politeness theory to explain the interlocutor’s observation of conversational implicatures in order to maintain the expressive order of interaction. Brown and Levinson observed politeness strategies that consistently occurred in their field data across several languages: Tzetzal and Tamil languages in Asia, and the British and American forms of English. Despite the distinctive cultures and languages, they observed outstanding parallelism in interlocutors’ use of polite language to accomplish conversational goals. Politeness is the activity performed to enhance, maintain, or protect face or the self-image of the interlocutors.

To illustrate language universality in politeness, Brown and Levinson proposed a socialized interlocutor—nicknamed a model person (MP)—as a face-bearing human with rationality and intentionality when communicating. To avoid breaching social equilibrium, the MP, whom Brown and Levinson identified as the speaker, conforms to social norms to be polite. In performing a speech act, the MP cultivates a desirable image (i.e., positive social worth), pays attention to the hearer’s responses, and ensures that nobody loses face in social interactions (e.g., feels embarrassed, humiliated, awkward, etc.).

Since face is emotionally invested (e.g., actors get upset) and sanctioned by social norms, actors are said to engage in rule-governed behavior to pay homage to their face. Due to the emotional investment, face threats are likely to occur when actors perform facework. Brown and Levinson described two basic face wants: positive face , the desire for one’s actions to be accepted by others, such as approval from others; and negative face , the desire for one’s actions to be unimpeded by others. A threat to positive face decreases approval from the hearer (e.g., acknowledging one’s vulnerability), whereas a threat to negative face restricts one’s freedom to act (e.g., requesting a favor).

According to the politeness theory, the speaker can choose whether or not to perform face-threatening acts (FTAs). When performing FTAs, the speaker will go on or off record. In going off record, the speaker uses hints or utterances that have more than one attributable intentions, so that he or she does not appear to have performed a speech act. For example, the speaker who utters “Oops, I don’t have any cash on me” to the hearer after they have dined together in a restaurant is using an off-record strategy to suggest that the hearer foot the bill. In contrast, going on record means that the speaker performs the FTA (i.e., baldly without saving face) with or without redress. With redress, the speaker indicates that he or she does not intend to violate social equilibrium by performing the FTA (see further discussion below). Without redress, the speaker directly expresses his or her desire; for instance, the speaker commands the hearer to pay for lunch by saying, “You should pay this time.”

The speaker can use either positive or negative politeness strategies when performing FTAs with redress. Positive politeness strategies are used to attend to the hearer’s positive face. For example, in the restaurant scenario, the speaker can choose to compliment the hearer in order to establish solidarity by saying, “You have always been so generous …” On the other hand, negative politeness strategies are used to avoid imposing on the hearer’s negative face. For example, by seeking permission, “Would you consider paying for lunch? I will return the favor in the future,” the speaker acknowledges that the hearer is not obligated to perform the action of footing the bill.

According to the politeness theory, the speaker wants to use the least amount of effort to maximize ends by considering the weight of performing the FTA. Brown and Levinson postulated a formula: Wx = P (S, H) + D (S, H) + R, where W stands for the weight of the FTA; P the relative power of hearer (H) over speaker (S), which is asymmetrical (e.g., if H is an authority); D the social distance between H and S, which is symmetrical (if H speaks another dialect); and R the ranking of imposition of the FTA in a particular culture. They suggested that P and D were universal with some emic correlates. Thus, in calculating Wx, S will consider the payoffs of each strategy. For example, in using positive politeness strategies, S may appear to be friendly, whereas in using an off-record strategy, S may appear manipulative by imposing on H, who gets S’s hints and then performs a future act. In using an on-record strategy, S may choose to be efficient, such as in an emergency (e.g., Ambush!).

After three decades, politeness theory remains one of the most tested theories. However, amongst its criticisms, the theory is said to account for intentional politeness, but not intentional impoliteness. The significant attention paid to the speaker’s utterances, albeit with a consideration for the hearer’s face, reveals the assumption of conversations as monologic. In some respects the theory followed the trajectory of Searle’s and Grice’s works in that the performance of utterances is conceptualized as a rational cognitive activity of the speakers. In particular, speakers are assumed to generate meanings and action, whereas hearers are treated as receivers who interpret the speech performance. Therefore, the politeness theory is unable to fully explain interactional organization in talk exchanges.

Conversation Analysis

During the 1960s, empirical science centered on the prediction of the effects of abstract ideas on communication and social life. Common predictors tested include personality types, cognition, biological sex, income level, and political stance. Social scientists who studied language commonly adhered to the quantitative paradigm; they conducted experiments, used elicited conversations, and analyzed responses containing rehearsals of recollected conversations. The study of mundane rituals, however, was not of academic concern.

Erving Goffman, a sociologist, later made a radical theoretical move that differed significantly from the mainstream empirical studies. Goffman stated that orderliness was empirically observable from everyday conversation. He argued that since socialization shapes the social actor’s competencies, conversation maintains moral codes and institutional order. In other words, sequential ordering of actions in social interaction reflects the macro social institution (e.g., politics, business, legal systems, etc.).

Goffman’s works were viewed as a paradigm shift in the social sciences. He called attention to the orderliness that is observable in ordinary conversation—an area of investigation that other scientists neglected. Furthermore, unlike the early works in language study, Goffman’s theoretical framework no longer focused solely on the performance of speakers in conversations. Instead, meaning making—that is, the examination of the participants’ understanding of one another’s conduct—took precedence. Goffman did not test his ideas, nor did he develop any set of empirical methods that allowed the testing of his ideas.

In search of an empirical analysis of conversation, Harold Garfinkel, another sociologist, expanded on Goffman’s ideas. Garfinkel ( 1967 ) proposed that ethno-methods (i.e., the study of people’s practices or methods) inform the production of culturally meaningful symbols and actions. He noted that social actors use multiple tacit methods (e.g., presuppositions, assumptions, and methods of inference) to make shared sense of their interaction. Thus, conversation is a place where participants engage in mundane reason analysis, and conversational sequential structure—the organization of social interaction—reveals membership categorization.

The subdiscipline of conversation analysis (CA) was further expanded when Harvey Sacks and Emanuel Schegloff, who were later joined by Gail Jefferson, studied suicide calls made to the Center for the Scientific Study of Suicide, Los Angeles (Sacks, 1984 ). They investigated how sequential structure is managed in institutional talk. Conversation analysts study conversation sequence organization, turn design, turn taking, lexical choices, the repair of difficulties in speech, and the overall conversational structure. They analyze linguistic mechanisms (e.g., grammar and syntax, lexis, intonation, prosody, etc.) in naturally occurring conversations.

Institutional talk, as examined in later CA studies, focused on those that have fewer formal constraints as institutional practices (e.g., phone calls, doctor–patient interaction, and classroom instructions), but not those that have rigid structures within formalized rituals (e.g., a religious wedding ceremony, a sermon, etc.). Institutional CA studies accelerated in the past few decades, allowing the identification of macro-level societal shifts through the management of social interaction in talk (Gee & Handford, 2012 ).

In general, CA theory postulates that talk is conducted in context. Participants’ talk and actions evoke context, and context is invoked and constructed by participants. Sequencing position in conversations reflects the participants’ understanding of the immediate preceding talk. As such, sequential structure reveals socially shared and structured procedures (Garfinkel, 1967 ). Thus, CA is the study of action, meaning, context management, and intersubjectivity.

CA is qualitative in methodology, even though later scholarship involved statistical analysis. The method is criticized for several weaknesses, among them: (a) the analysis and presentation of select segments of conversation lack rationale; (b) most CA studies are restricted to studying conversations in North America and Europe; (c) since multiple identities are at play in conversations, those that are consequential for social interaction remain ambiguous and debatable in analyses; and (d) the boundaries between pleasantries (e.g., small talk) and institutional talk are at times fuzzy in institutional CA (Have, 1990 ). Nevertheless, with a range of sub-areas quite well developed, CA is said to form its own discipline.

Discourse Analysis

Discourse Analysis (DA) is a broad term for different analytical approaches used to examine text and talk. Discourse is considered language use in general, and language is viewed as a form of action. The distinctions between the different approaches used in DA are based on the influences of the early works or traditions in conversation analysis and ethnomethodology, discursive psychology, critical discourse analysis and critical linguistics, Bakhtinian research, Foucauldian research, and even interactional sociolinguistics (Gee & Handford, 2012 ). However, the very different approaches and practices in DA have sparked disagreements among researchers about their applications and distinctions.

Data used in DA range from written to spoken, such as recorded spontaneous conversation, news articles, historical documents, transcripts from counseling sessions, clinical talk, interviews, blogs, and the like. Socio-historical contexts are often included in DA. As a tool for analyzing text and talk, DA has significantly influenced the study of language and culture. Two of the most popular DA approaches used in communication studies are Discursive Psychology (DP) and Critical Discourse Analysis (CDA).

Discursive Psychology

DP evolved in the early 1990s from Derek Edwards and Jonathan Potter’s works, in which they expressed dissatisfaction with the ways psychologists treated discourse. In psychology, utterances are treated as a reflection of the speaker’s mental state. Hence, talk is considered reflective (Edwards & Potter, 2005 ). However, in DP talk is considered constructive; language use is thus viewed as a social action or function. This means that people use language to make sense of what they do in a socially meaningful world. Therefore, language is treated as a tool to get things done.

In DP, researchers study the details of what people say (e.g., descriptions, terms, lexicons, or grammar). Researchers are concerned with how these features have particular effects or bear functions, such as shifting blame, denying responsibility, and providing counterarguments. DP researchers seek to understand the interests, attitudes, and motives of the speakers, particularly, why people use language the way they do and how they manage and construct identities.

Language use in news media coverage provides a good example for DP analysis. For example, the August 2015 news coverage about corruption in Malaysian government offices supplies rich vocabularies for analyzing the speakers’ motives. Under the leadership of Bersih (an organization whose name literally translates to clean in the Malay language), an estimated half a million street demonstrators peacefully gathered in Kuala Lumpur, the country’s capital, for a public demonstration that lasted two days. The demonstrators demanded transparency in the country’s governance, including fair elections. They urged the Prime Minister, Najib Razak, to resign following a critical exposé published in The Wall Street Journal . The Prime Minister was reported to have transferred the equivalent of US$11 billion from a government development firm into his personal bank account (Wright & Clark, 2015 ). Prior to the Prime Minister’s counterattack, the press labeled the demonstrators rally goers . However, the Prime Minister and his acolytes in government in turn used descriptors such as criminals, crazy, unpatriotic , and shallow-minded culprits to label the demonstrators traitors to their country.

The description above shows the way the speakers used language to construct their reality and their relationship to that reality. In this case, DP researchers would analyze and illustrate how the Prime Minister and his government officials co-construct shared meanings in interaction, such as particular realities, beliefs, identities, or subjectivities. For instance, the government can be seen as attempting to exercise control over the public demonstrators (through discourse) in order to defend governmental power. Thus, by labeling the demonstrators culprits , the government asserted its identity as the authority— the elite power that runs the country and decides what goes.

DP researchers assume that each speaker has multiple identities, and the identities can only be performed successfully with the consent of the listeners (Antaki & Widdicombe, 1998 ). The researchers also assert that the productive examination of discourse must be considered within the context of language use, such as the institutional setting and local sequential organization of talk. For example, a proper analysis of the Malaysian public demonstration above must include an understanding of the context of the public demonstrators’ dissatisfaction with governmental corruption and citizen’s demand for transparency in governance—a longstanding issue since the country’s independence from Britain. Thus, indexicality—the understanding that the meaning of a word is dependent on the context of use—is essential in DP analysis (Potter, 1996 ).

Perhaps one of the strongest criticisms of DP is the researchers’ reluctance to interpret macro-social concerns. DP researchers insist that the analysis of text and talk should depend on the context exactly as construed by the language used. This means that extratextual information should not be inserted in the analysis. Therefore, DP cannot be utilized to interrogate broader social concerns, such as politics, ideology, and power (Parker, 2015 ). As such, context is limited to and constituted by the interactional setting and functions of utterances.

DP is also criticized for casting speakers as conscious and agentic—that is, as autonomous subjects who manipulate language to do things. Speakers’ intentionality in attribution is thus considered fixed in their minds. Such an assumption in fact closely resembles that of traditional psychology—the very idea that DP researchers attempted to shift away from (Parker, 2015 ). Moreover, the analyst’s interpretation is crucial in unfolding an understanding of the discourse. The analyst’s knowledge and statuses thus influence his or her interpretation of the language used by speaker and can be a weakness if the analyst may conform to some sort of ideology that impacts data interpretation.

Critical Discourse Analysis

Of all the approaches used to study DA, CDA is one that takes a macrosocietal and political standpoint (Van Dijk, 1993 ). Critical discourse analysts examine how societal power relations are enforced, legitimated, maintained, and dominated through the use of language. The sociohistorical context of the text is emphasized. The examination of social problems requires the analyst to be well versed in multiple disciplines. Commonly, the analysts are motivated by particular political agendas or ideologies, and they seek to challenge certain ideologies (Fairclough, 2005 ). Therefore, based on, say, the motivation to fight social inequality and oppression, an analyst may seek out selected texts or talks for study. It is in CDA studies that the abuse, dominance, and unequal distribution of social goods are called into question.

Social theorists whose works are commonly cited in CDA include Pierre Bourdieu, Antonio Gramsci, Louis Althusser, Karl Marx, Jürgen Habermas, and Michel Foucault. Typical vocabulary in CDA studies includes power, dominance, hegemony, class, gender, race, discrimination, institution, reproduction , and ideology . Topics examined include gender inequality, media discourse, political discourse, racism, ethnocentrism, nationalism, and antiSemitism. Critical discourse analysts seek to answer questions such as: How do elite groups control public discourse? How does such discourse control the less powerful group (in terms of mind and action)? What are the social consequences of such discourse control? (Van Dijk, 1993 ). The dominant social groups in politics, media, academics, and corporations are scrutinized in terms of the way they produce and maintain the dominant ideology.

Critical discourse analysts explore three contextual levels of discourse: the macro, meso, and micro (Van Dijk, 1993 ). At the macro level, analysts focus on the understanding of relationship between the text and broader social concerns and ideologies. At the meso level, analysts examine the contexts of production and reception of the text, and the ideologies portrayed. The analysts ask questions such as: Where did the text originate? Who is (are) the author(s) and the intended audience of the text? What perspectives are being promoted? At the micro level, analysts scrutinize the forms and contents of the text through linguistic features and devices in order to reveal the speaker’s perspective or ideology. Linguistic features and components studied include direct and indirect quotations, terms used to refer to individuals or groups, sentence structure and grammar (e.g., active and passive voice), and premodifiers (e.g., non-Muslim citizens or Muslim-Chinese citizens).

While analysts frequently favor institutional texts (e.g., a journalistic report) in their analyses, everyday conversation is also included. In fact, everyday conversation is considered social group discourse that can be used to reveal societal norms and shared beliefs. According to van Dijk’s studies of racism in everyday conversation, he found that the speakers’ utterances of “I am not racist, but …” and “We are not a racist society, but …” are in fact a reproduction of institutional talk. He called this specific type of talk a double strategy of positive self-representation and negative other-denigration.

While the multidisciplinary nature of CDA seems beneficial, it is also one of its biggest criticisms. In particular, critical discourse analysts are often accused of not productively using a combination of multiple approaches. Indeed, the more linguistically-oriented studies of text and talk overlooked theories in sociology and political sciences that focus on social and power inequality issues. On the other hand, those that focus on sociology and political sciences did not rigorously engage in DA. Moreover, the relationship between discourse and action coupled with cognition remains inconclusive (Van Dijk, 1998 ).

The Ethnography of Communication

The ethnography of communication originated from ethnology in the 1800s and found a home in in anthropology. Bronislaw Malinowski, a Polish anthropologist, pioneered the ethnographic methods. He intensively recorded the methods he used in his fieldwork when studying the Trobrian Islanders of Papua New Guinea in 1914 , including intrinsic details about the people, their language, and their daily life (Murdock, 1943 ). Franz Boas, a German anthropologist who lived among the Inuit in the late 1800s, further propounded on the necessity for language training among ethnographers who wished to decode the emic (i.e., native) perspective (Muller-Wille, Gieseking, & Barr, 2011 ).

Ethnographers study social norms, meanings, and patterns of life by examining symbolic activities ranging from speech to social artifacts. By writing on culture, recording people, and natural history, ethnographers describe, analyze, and compare people from different communities. The painstaking work involved in ethnography provides rich data that are highly nuanced. Ethnographic works are said to be the portraits of social life. Oftentimes, interviews are used concurrently, along with other methods (e.g., textual analysis) to obtain community members’ interpretation and explanation of the communicative activities. Data analyses are conducted along with (i.e., not after) data recording in the field.

While an ethnographer may generate questions for investigation before entering the field, he or she must remain flexible and receptive to other important questions that may emerge on site. The focus of investigation might shift because theoretical sensitivity—the review of literature prior to fieldwork—may not sufficiently orient the ethnographer to actual interactions. This is because the behaviors and activities that the ethnographer purports to study may have changed due to cultural shift. The use of such an inductive method allows the study of language and culture without theoretical constraints.

Ethnographers may compare the behaviors cross-culturally when a sufficient number of studies of the cultures of interest become available. Since the voices of community members are given precedence, ethnographic reports rely heavily on and present people’s utterances, as well as fine details of observations. In fact, early ethnographic works in anthropology tend to exhaustively cover many life aspects about a community, though the search for nuances and painstaking details, coupled with the ethnographer’s prolonged engagement in the community, pose constraints of time and resources. However, in the 1960s, ethnography took a new turn with the greater emphasis on the study of language use.

The Ethnography of Speaking

The prominence of ethnographic studies focusing on speech in language and culture began in the 1960s with Dell Hymes’s study of language use. Hymes, who was trained in anthropology and linguistics, sought to understand speech patterns, functions, and speaking in situatedness. He departed from microlinguistics (which focuses on semantics, turn-taking, prosody, and conversational structure) to pursue a more holistic account of interaction in context. Hymes emphasized the examination of nonverbal cues, tone of conversation, evaluation of the interlocutors’ conduct, the setting of the interaction, and so forth.

Speaking is considered fundamental in understanding social reality. Hymes’s ethnography of speaking (later called ethnography of communication) is a method for analyzing communication in different cultural settings. Hymes’s ( 1972 ) SPEAKING mnemonic or schema, developed as an etic framework for the etic understanding of social interaction, provides an inductive tool for examining social and cultural elements through the means and ways of speaking. Each letter in the SPEAKING mnemonic represents a different element of a speech act: S represents the setting or scene; P , the participants and participant identities; E , the ends; A , the act sequence and act topic; K , the key or tone; I , the instrumentalities; N , the norms of interaction and interpretation; and G , the genre.

The SPEAKING mnemonic is one of the most widely used theoretical and analytic frameworks in ethnographic studies. Although Hymes developed it to study spontaneous conversation, recent communication studies has broadened the scope of the data to include textual analysis and computer-mediated communication. Such pluralities are, in fact, inherent in people’s ways of speaking and despite some criticisms (e.g., Hymes proposed using his methods to study muted groups, but researchers who wish to listen to minority voices must also learn to listen to the dominant ones), the ethnography of speaking’s theoretical framework has withstood the test of time. It was the inspiration for Gerry Philipsen’s ( 1992 ) speech codes theory—another important heuristic theory in the ethnographic study of language and culture.

Speech Codes Theory

In addition to Hymes’ ethnography of speaking, Philipsen drew from Bernstein’s coding principle ( 1971 ) to postulate his speech codes theory. Bernstein argued that different social groups manifest different communicative practices and linguistic features. These differences are influenced by and, in turn, reinforce the groups’ coding principles—the rules that govern what to say and how to say it in the right context.

According to Philipsen, people’s ways of speaking are woven with speech codes—the system of symbols, meanings, premises, and rules about communication conduct that are historically situated and socially constructed. Therefore, examining a community’s discourse can tease out people’s understanding of the self, society, and strategic action. Philipsen posited five propositions for studying the relationship between communication and culture:

People in different speech communities exhibit different ways of speaking, with different rules for communicative conduct informed by their socially constructed symbols and meanings.

Each code gives practical knowledge about the ways of being in a speech community.

People attach different cultural meanings to speech practices.

Metacommunication (i.e., talk about talk) reveals important worldviews, norms, and values of the people.

The common speech code reveals the morality of communication conduct. For example, community members’ discourse about should not s reveal the should s that they value.

Using the five propositions, Philipsen argued that the speech codes theory can reveal the ways of speaking and reinforce a group’s speech codes. Indeed, the theory has informed the vibrant scholarship on ways of speaking and meaning-making across different global cultural communities. For example, Lee and Hall’s ( 2012 ) study of Chinese Malaysian discourse of dissatisfaction and complaint-making, with and without a formal goal of resolution—called, respectively, thou soo and aih auan— unearthed previously unexplored cultural values of the speech community. Lee ( 2014 ) developed the study further to understand the assumptions of personhood among Chinese Malaysians.

Cultural Discourse Analysis

The speech codes theory also served as the foundation for the development of Donal Carbaugh’s cultural discourse analysis theory. Carbaugh, a former student of Philipsen’s, proposed the cultural discourse theory (CDT) as a way to understand culturally shaped communication practices. According to CDT, cultural discourses are constituted by cultural communication and codes. Culture is an integral part but also a product of communication practices that are highly nuanced and deeply meaningful and intelligible to cultural participants (Carbaugh, 1996 ). Cultural participants draw on diverse communication practices and thus create diversity within and across cultural communities.

Cultural discourse analysts study key cultural terms that are deeply meaningful to the participants; for example, oplakvane , which is a distinctive way of speaking to assert Bulgarian personhood (Carbaugh, Lie, Locmele, & Sotirova, 2012 ). Such cultural terms are an ongoing metacultural commentary that reveals implicit cultural knowledge, the taken-for-granted knowledge, such as beliefs, values, and assumptions about the self.

Three types of questions typically guide cultural discourse analysis (CuDA) are: (a) functional accomplishment (What is getting done when people communicate in this specific way?); (b) structure (How is this communicative practice conducted? What key cultural terms are used to give meaning to the participants? What deep meanings do the terms create?); and (c) sequencing or form (What is the act sequence of this communicative practice, in terms of interactional accomplishments, structural features, and sequential organization?).

The analyst approaches a CuDA project with a particular stance or mode of inquiry. Carbaugh identified five modes of inquiry that enable analysts to tease out important cultural ingredients in a topic of investigation: the theoretical, descriptive, interpretive, comparative, and critical. For example, the theoretical mode enables analysts to understand the basic communication phenomena in the speech codes of a community and therefore to refine what and how to listen for culture in their discourse before venturing into the field. The five modes chart a rough linear design; the analyst must accomplish the preceding mode before embarking on the subsequent mode. The first three modes (i.e., theoretical, descriptive, and interpretive) are mandatory in any CuDA project; however, the last two (i.e., comparative and critical) may or may not be accomplished in a single study (e.g., in an exploratory study).

Cultural discourse analysts typically use Hymes’s SPEAKING framework and Philipsen’s speech codes theory as guidelines for their subsequent analyses in the descriptive and interpretive stages. The analysis of implicit cultural meanings in CuDA can be structured using five semantic radiants or hubs: being , acting , relating , feeling , and dwelling . Using CuDA, analysts can tease out people’s understanding of who they are (being); what they are doing together (acting); how they are linked to one another (relating); their feelings about people, actions, and things (feeling); and their relationship to the world around them (dwelling). The cultural discourse analyst’s task, then, is to advance cultural propositions (i.e., statements containing the taken-for-granted knowledge) and premises (i.e., values or beliefs). These are statements that shed light on the importance of a particular communicative practice among members of a speech community (e.g., beliefs about what exists, what is proper, or what is valued).

While the theories in the ethnography of communication have gained a lot of prominence in the LSI discipline, they have also enriched it. For example, Hymes’s SPEAKING framework, Philipsen’s speech codes theory, and Carbaugh’s CDT have all added depth and rigor to LSI data analysis. Evidently, to navigate through the language and social interactions of a community to which the researcher is not an insider, he or she needs to gain communicative competence (Hymes, 1962 ). Specifically, the researcher needs to know how to communicate like the insiders in order to articulate and explain the behaviors and communicative phenomena to other outsiders. The researcher also needs to gain competence particularly in the multidisciplinary methods of LSI.

However, neither reliance on English as lingua franca for LSI research nor the practice of hiring translators are sufficient for undertaking this line of inquiry successfully. Therefore, many LSI studies recruit international scholars to participate in their research projects. While this is a common practice, especially in CuDA, the researchers’ cultural interpretations and the subsequent translation of the data into the English for publications need to be done with utmost care in order to maintain the integrity of cultural nuances. Moreover, while the scholarship has strived to give voice to muted, non-dominant groups internationally, the dearth of cross-comparative studies—a goal and a tradition of ethnography—is a great concern. In that sense the study of intercultural interaction using the ethnography of communication has not yet come of age in this increasingly globalized and complex world.

This essay outlines the history and evolution of the study of language and culture by the main areas of study in the LSI discipline. The four main areas summarized are language pragmatics, conversation analysis, discourse analysis, and the ethnography of communication. Influential methodological and theoretical frameworks reviewed cover the Sapir–Whorf hypothesis, speech act theory, Grice’s maxims of implicatures, politeness theory, discursive psychology, critical discourse analysis, the ethnography of speaking, speech codes theory, and cultural discourse analysis. Finally, the essay examines major criticisms of the theories and applications, as well as possible future directions of scholarship, when and where appropriate in the discussion.

Further Reading

  • Edwards, D. , & Potter, J. (1992). Discursive psychology . London: SAGE.
  • Gee, J. P. (2014). An introduction to discourse analysis: Theory and method (4th ed.). New York: Routledge.
  • Erving Goffman Archives in the Intercyberlibrary of the University of Nevada .
  • Goffman, E. (1971). Relations in public: Micro studies of the public order . New York: Basic Books.
  • Hall, E. T. (1976). Beyond culture . New York: Doubleday.
  • Hymes, D. (1962). The ethnography of speaking. In T. Gladwin & W. C. Sturtevant (Eds.), Anthropology and human behavior (pp. 13–35). Washington, DC: Anthropology Society of Washington.
  • Martin, J. N. , Nakayama, T. K. , & Carbaugh, D. (2012). The history and development of the study of intercultural communication and applied linguistics. In J. Jackson (Ed.), The Routledge handbook of language and intercultural communication (pp. 17–36). Oxon, England: Routledge.
  • Philipsen, G. (1975). Speaking “like a man” in Teamsterville: Culture patterns of role enactment in an urban neighborhood. Quarterly Journal of Speech , 61 , 13–22.
  • Wodak, R. , & Chilton, P. (2005). A new agenda in (critical) discourse analysis: Theory, methodology and interdisciplinarity . Philadelphia: John Benjamins.
  • Wooffitt, R. (2005). Conversation analysis and discourse analysis: A comparative and critical introduction . London: SAGE.
  • Antaki, C. , & Widdicombe, S. (1998). Identity as an achievement and as a tool. In C. Antaki & S. Widdicombe (Eds.), Identities in talk . London: SAGE.
  • Austin, J. L. (1962). How to do things with words . Oxford: Oxford University Press.
  • Bernstein, B. (1971). Class, codes and control: Vol. 1. Theoretical studies towards a sociology of language . London: Routledge and Kegan Paul.
  • Brown, P. , & Levinson, S. (1987). Politeness . Cambridge, U.K.: Cambridge University Press.
  • Carbaugh, D. A. (1996). Situating selves: The communication of social identities in American scenes . Albany: State University of New York Press.
  • Carbaugh, D. , Lie, S. , Locmele, L. , & Sotirova, N. (2012). Ethnographic studies of intergroup communication. In H. Giles & C. Gallois (Eds.), The handbook of intergroup communication (pp. 44–57). New York: Routledge.
  • Chomsky, N. (1972). Language and mind . New York: Harcourt Brace.
  • Edwards, D. , & Potter, J. (2005). Discursive psychology, mental states and descriptions. In H. T. Molder & J. Potter (Eds.), Conversation and cognition (pp. 241–259). Cambridge, U.K.: Cambridge University Press.
  • Fairclough, N. (2005). Peripheral vision: Discourse analysis in organization studies: The case for critical realism. Organization Studies , 26 , 915–939.
  • Fitch, K. L. , & Sanders, R. E. (Eds.). (2005). Handbook of language and social interaction . Mahwah, NJ: LEA.
  • Garfinkel, H. (1967). Studies in ethnomethodology . Englewood Cliffs, NJ: Prentice-Hall.
  • Gee, J. P. , & Handford, M. (2012). The Routledge handbook of discourse analysis . New York: Taylor & Francis.
  • Goffman, E. (1967). Interaction ritual: Essays in face-to-face interaction . Chicago: Aldine.
  • Grice, H. P. (1989). Studies in the way of words . Cambridge, MA: Harvard University Press.
  • Hadi, A. (2013). A critical appraisal of Grice’s cooperative principle. Open Journal of Modern Linguistics , 3 , 69–72.
  • Have, P. t. (1990). Doing conversation analysis: A practical guide . London: SAGE.
  • Hall, E. T. (1966). The hidden dimension . New York: Anchor Books.
  • Hymes, D. (1972). Models of the interaction of language and social life. In J. J. Gumperz & D. Hymes (Eds.), Directions in sociolinguistics: The ethnography of communication (pp. 35–71). New York: Holt, Rinehart and Winston, Inc.
  • Lee, E. L. (2014). Assumptions of personhood in the discourse about Chinese identity in Malaysia. In M. B. Hinner (Ed.), Chinese culture in a cross-cultural comparison (pp. 77–110). Frankfurt: Peter Lang.
  • Lee, E. L. , & Hall, B. “J” (2012). Cultural ideals in Chinese Malaysians’ discourse of dissatisfaction. In M. B. Hinner (Ed.), The interface of business and culture (pp. 365–390). Frankfurt: Peter Lang.
  • Leeds-Hurwitz, W. (1990). Notes in the history of intercultural communication: The Foreign Service Institute and the mandate for intercultural training. Quarterly Journal of Speech , 76 , 262–281.
  • Mandelbaum, D. G. (Ed.). (1963). Selected writings of Edward Sapir in language, culture, and personality . Berkeley and Los Angeles, CA: University of California Press.
  • Muller-Wille, L. , Gieseking, B. , (Eds.) & Barr, W. (Trans.). (2011). Inuit and Whalers on Baffin Island through German eyes: Wilhelm Weike’s Arctic journal and letters (1883–1884) . Montréal, Canada: Baraka Books.
  • Murdock, G. P. (1943). Bronislaw Malinowski. American Anthropologist , 45 , 441–451.
  • Parker, I. (2015). Critical discursive psychology (2d ed.). Basingstoke, U.K.: Palgrave Macmillan.
  • Philipsen, G. (1992). Speaking culturally: Explorations in social communication . Albany: State University of New York Press.
  • Philipsen, G. (1997). Toward a theory of speech codes. In G. Philipsen & T. Albrecht (Eds.), Developing communication theories (pp. 119–156). Albany: State University of New York Press.
  • Potter, J. (1996). Representing reality: Discourse, rhetoric, and social constructions . Thousand Oaks, CA: SAGE.
  • Sacks, H. (1984). Notes on methodology. In J. M. Atkinson & J. Heritage (Eds.), Structures of social action (pp. 21–27). Cambridge, U.K.: Cambridge University Press.
  • Searle, J. R. (1969). Speech acts: An essay in the philosophy of language . London: Cambridge University Press.
  • Ten Have, P. (1999). Doing conversation analysis: A practical guide . London: SAGE.
  • Van Dijk, T. (1993). Principles of critical discourse analysis. Discourse and Society , 4 , 249–285.
  • Van Dijk, T. A. (1998). Ideology: A multidisciplinary approach . London: SAGE.
  • Whorf, B. L. (1952). Collected papers on metalinguistics . Washington, DC: Foreign Service Institute.
  • Wright, T. , & Clark, S. (2015, July 2). Investigators believe money flowed to Malaysian leader Najib’s accounts amid 1MDB probe . Wall Street Journal .

Related Articles

  • Verbal Communication Styles and Culture
  • Terrorism and Intergroup Communication
  • Cultural Communication
  • The Politics of Translation and Interpretation in International Communication

Printed from Oxford Research Encyclopedias, Communication. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 04 April 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • Accessibility
  • [66.249.64.20|81.177.182.154]
  • 81.177.182.154

Character limit 500 /500

Introduction to Special Issue on Language and Culture

  • Published: 01 August 2020
  • Volume 49 , pages 509–510, ( 2020 )

Cite this article

  • Rafael Art. Javier 1 ,
  • Aubrey Faber 1 ,
  • Marko Lamela 1 &
  • Yosef Amrami 1  

2324 Accesses

Explore all metrics

Avoid common mistakes on your manuscript.

Language and culture have been found to be intimately and intricately interconnected (Wang 2017 ). It is through language that the expression of thoughts and perceptions are made known. With the explosion of new ways of communication, from instant messaging and texting, to social media (Hinduja and Patchin 2010 ), and now to Memes, etc., we are faced with new challenges to examine fundamental features of communication, as well as their impact on the nature and quality of the intended meaning and cognitive and emotional content. The unique nature of cultural influences on language are anchored in different components of language functions (Javier 2007 ; Javier and Lamela 2020 ). Thus, it is not surprising that linguistic and paralinguistic aspects of language inherent in different modes of communication become particularly sensitive to specific cultural environments where linguistic codes are developed and utilized. We are particularly interested in these types of investigations because of the cross-cultural and cross-linguistic contexts that are now made possible through the internet. Online these different modes of communication may be at play such that the possibility for misinterpretation of the intended meeting may have dire consequences in the sociopolitical and socioeconomic arena, not to mention in interpersonal relationships.

In the series of manuscripts included in this special issue, we highlight current research exploring some of these issues to encourage more investigations of critical linguistic components in relation to augmenting or obscuring the intended meaning of the communication. The possibility for distortion of the intended meaning of the communication becomes particularly apparent when more than one linguistic code is used in the communication, and likely to become implicated in disrupting its intended meaning, leading to misunderstanding (Wang 2017 ). The world is getting smaller, with online communication and international travel cultures are meeting all the time. People can translate a message in one language online into another with just the click of a button—and that translation may cloud someone’s understanding. Studying of language-specific syntactic, semantic, and grammatical rules, the use of specific sentence constructions to reflect complex genitive-possession, the modulation of emotional affects and vocabulary concept accessibility (attrition rate), linguistic mechanisms that are engaged in meaning construction, impact of different vocalization (consonant pronunciation) on accuracy of intralingual translation of meaning, different language processing in communication requiring processing of tactile sensations, specific individual characteristics likely to mediate/modulate cross-language communication and content accessibility, research on idioms and metaphors, etc. are all important and consequential aspects of language production that may bear on meaning construction. We decided to include in this special issue a series of manuscripts which, although submitted independently, explore some of these issues toward the goal of encouraging future submissions of manuscripts addressing issues of language and culture.

Hinduja, S., & Patchin, J. W. (2010). Bullying, cyberbullying, and suicide. Archives of Suicide Research, 14 (3), 206–221.

Article   Google Scholar  

Javier, R. A. (2007). The bilingual mind: Feeling and speaking in two languages . New York, London: Springer.

Book   Google Scholar  

Javier, R. A., & Lamela, M. (2020). Cultural and linguistic issues in assessing trauma inforensic context. In R. A. Javier, E. A. Owen, & J. A. Maddux (Eds.), Assessing trauma in forensic contexts (pp. 151–176). New York, London: Springer.

Chapter   Google Scholar  

Wang, X. (2017). Incommensurability and cross-language communication . New York, London: Routledge.

Google Scholar  

Download references

Author information

Authors and affiliations.

St. John’s University, New York, USA

Rafael Art. Javier, Aubrey Faber, Marko Lamela & Yosef Amrami

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Rafael Art. Javier .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Javier, R.A., Faber, A., Lamela, M. et al. Introduction to Special Issue on Language and Culture. J Psycholinguist Res 49 , 509–510 (2020). https://doi.org/10.1007/s10936-020-09724-5

Download citation

Published : 01 August 2020

Issue Date : August 2020

DOI : https://doi.org/10.1007/s10936-020-09724-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

language and culture research paper

International Journal of Language and Culture

The aim of the International Journal of Language and Culture (IJoLC) is to disseminate cutting-edge research that explores the interrelationship between language and culture. The journal is multidisciplinary in scope and seeks to provide a forum for researchers interested in the interaction between language and culture across several disciplines, including linguistics, anthropology, applied linguistics, psychology and cognitive science. The journal publishes high-quality, original and state-of-the-art articles that may be theoretical or empirical in orientation and that advance our understanding of the intricate relationship between language and culture.

Topics of interest to IJoLC include, but are not limited to the following: Culture and the structure of language; Language, culture, and conceptualisation; Language, culture, and politeness; Language, culture, and emotion; Culture and language development; Language, culture, and communication.

Submissions on these and other topics within the aim and scope of the journal are invited. More information can be found below under "Submission".

IJoLC is a peer-reviewed journal published twice a year. IJoLC publishes its articles Online First.

2 April 2024

  • Metaphtonymy and semio-cognitive de-legitimation of Donald Trump in the meme discourse of The Daily Show with Trevor Noah (January 2016–December 2019) Nashwa Elyamany &  Maha SalahEldien Mohamed Hamed
  • The heart became hot : A conceptualization of anger in Dagbani and Dangme Jonathan Tanihu &  Samuel Alhassan Issah

7 March 2024

  • Women have no honour of their own : Conceptualizations of honor in Indian English and Pakistani English Ansa Mahmood &  Kim Ebensgaard Jensen

8 February 2024

  • Bold colors, sweeping melodies, offensive smells : A corpus-based analysis of the figurative representations of visual, auditory, and olfactory stimuli in English and Hungarian Ádám Galac

15 January 2024

  • Choice of language in the construction of cultural identity by Tamil speakers in India Elizabeth Eldho &  Rajesh Kumar | IJOLC 10:1 (2023) pp. 54–86

12 December 2023

  • Conceptualizing health : A corpus-based Cultural Linguistic study Penelope Scott | IJOLC 10:1 (2023) pp. 1–32

7 November 2023

  • Conceptualization of Sar (Head) in Persian figurative expressions Nahid Ahangari | IJOLC 10:1 (2023) pp. 33–53

25 September 2023

  • Mabia languages and cultures expressed through personal names Hasiyatu Abubakari , Samuel Alhassan Issah , Samuel Owoahene Acheampong , Moses Dramani Luri &  John Naporo Napari | IJOLC 10:1 (2023) p. 87

8 September 2023

  • Request strategies : A socio-pragmatic study of the Javanese community in Indonesia Edy Jauhari &  Dwi Handayani | IJOLC 10:1 (2023) pp. 115–144

27 June 2023

  • The structure of the concept of kærlighed ‘love’ in Danish Aleksander Kacprzak | IJOLC 9:2 (2022) pp. 258–291

15 June 2023

  • Emotional self-disclosure and stance-taking within affective narratives on YouTube : A qualitative case study of four Spanish YouTubers Sanna Pelttari | IJOLC 9:2 (2022) pp. 292–321

6 June 2023

  • Jieun Kiaer . 2021 . Delicious Words: East Asian Food Words in English Reviewed by Hugo Wing-Yu Tam | IJOLC 9:2 (2022) pp. 322–324

4 November 2022

  • Metaphorical conceptualizations of cancer treatment in English and Chinese languages Mei-Yung Vanliza Chow &  Jeannette Littlemore | IJOLC 9:2 (2022) pp. 194–232
  • Animal names applied to a person in Maasai society Eliakimu Sane | IJOLC 9:2 (2022) pp. 177–193

21 October 2022

  • A spatial model of conceptualization of time : With special reference to English and Armenian fairy tales Yelena Mkhitaryan &  Lusine Madatyan | IJOLC 9:2 (2022) pp. 233–257

Volume 10 (2023)

Volume 9 (2022), volume 8 (2021), volume 7 (2020), volume 6 (2019), volume 5 (2018), volume 4 (2017), volume 3 (2016), volume 2 (2015), volume 1 (2014).

General information about our electronic journals .

Subscription rates

All prices for print + online include postage/handling.

65.00 per volume. Private subscriptions are for personal use only, and must be pre-paid and ordered directly from the publisher.

Available back-volumes

International Journal of Language and Culture invites submissions relevant to the aim & scope of the journal.

Manuscripts should be submitted through the journal’s online submission and manuscript tracking portal .

Manuscripts submitted per email will not be processed.

Please consult the guidelines and the Short Guide to EM for Authors before you submit your paper.

John Benjamins journals are committed to maintaining the highest standards of publication ethics and to supporting ethical research practices. Please read this Ethics Statement .

Rights and Permissions

Authors must ensure that they have permission to use any third-party material in their contribution; the permission should include perpetual (not time-limited) world-wide distribution in print and electronic format.

For information on authors' rights, please consult the rights information page .

Open Access

Articles accepted for this journal can be made Open Access through payment of an Article Publication Charge (APC) of EUR 1800 (excl. tax); more information can be found on the publisher's Open Access Policy page . There is no fee if the article is not to be made Open Access and thus available only for subscribers.

Corresponding authors from institutions with which John Benjamins has a Read & Publish arrangement can publish Open Access without paying a fee; information on the institutions and which articles qualify, can be found on this page .

For information about permission to post a version of your article online or in an institutional repository ('green' open access or self-archiving), please consult the rights information page .

John Benjamins Publishing Company has an agreement in place with Portico for the archiving of all its online journals and e-books.

For the benefit of production efficiency, the publisher and the editor ask you to follow the following submission guidelines strictly. Papers that do not follow these guidelines will be returned to the author.

Contributions should be consistent in their use of language and spelling. If you are not a native speaker of the language in which you have written your contribution, it is advised to have your text checked by a native speaker.

When submitting the final manuscript to the journal, please include: a one-paragraph abstract, approximately five keywords, a short professional biography of the author, and a current mailing address.   

Electronic files

Files. Contributions should not exceed 10,000 words. They should be in English following the American Psychological Association (APA) style. Authors who are not a competent user of academic English are advised to have their paper checked by a native speaker before submission.

Please take care that you supply all the files, text as well as graphic files, used in the creation of the manuscript, and be sure to submit the final version of the manuscript. And please delete any personal comments so that these will not mistakenly be typeset and check that all files are readable.

File naming conventions. When naming your file please use the following convention:  use the first three characters of the first author’s  last name; if that name is Johnson, the file should be named JOH.DOC, JOH.WP5, etc. Do not use the three character extension for things other than the identification of the file type ( not JOH.ART, JOH.REV). Figures can be named as follows JOH1.EPS, JOH2.TIF, JOH3.XLS, etc.

Software. Word (PC/Mac) is preferred. If you intend to use other word processing software, please contact the editors first.

Graphic files : Please supply figures as Encapsulated Postscript (EPS) or Tagged Image File Format (TIFF) conversion in addition to the original creation files.  For graphics that are not available in digital format, such as photographs, spectrographs, etc., please provide sharp and clear prints ( not photocopies) in black & white.

In order to facilitate smooth production it is important that you follow the journal’s style for consistency. In this respect we advise you to make use of our electronic styles in addition to these guidelines. Do not add running heads, implement full justification or hyphenation, or the exact margin settings as used by Benjamins in printing. It is sufficient to characterize elements such as examples, quotations, tables, headings etc. in the formatting in a clear and consistent way, so that they can be identified and formatted in the style of the journal. Formatting that should be supplied by you is the formatting of references (see below) and font enhancements (such as italics, bold, caps, small caps, etc.) in the text. Whatever formatting or style conventions are employed, please be consistent.

Tables and figures. All tables, trees and figures must fit within the following page size (if necessary, after – limited – reduction) and should still be legible at this size: 11.5 cm (4.52”) x 19 cm  (7.48”) . Suggested font setting for tables: Times Roman 10 pts (absolute minimum: 8 pts). Tables and figures should be numbered consecutively, provided with appropriate captions and should be referred to in the main text in this manner, e.g., “in table 2”, but never like this “in the following table: “. Please indicate the preferred position of the table or figure in the text.

Running heads.  Please do not include running heads with your article. However, in case of a long title please suggest a short one for the running head (max. 55 characters) on the cover sheet of your contribution. 

Emphasis and foreign words. Use italics for foreign language, highlighting and emphasis. Bold should be used only for highlighting within italics and for headings. Please refrain from the use of FULL CAPS (except for focal stress and abbreviations) and underlining (except for highlighting within examples, as an alternative for boldface),unless this is a strict convention in your field of research. For terms or expressions (e.g., ‘context of situation’) please use single quotes. For glosses of citation forms, use double quotes.

Transliteration. Please transliterate into English any examples from languages that use a non-Latin script, using the appropriate transliteration system (ISO or LOC).

Symbols and special characters. In case you have no access to certain characters, we advise you to use a clear convention to mark these characters. You can use our font table (Appendix A) or any other regular table to list the correspondences between your symbols and the required ones. If you use any phonetic characters, please mark these by the use of a character style if possible. This will enable us to retrieve those characters in your document.

Chapters and headings. Chapters or articles should be reasonably divided into sections and, if necessary, into sub-sections. If you cannot use the electronic styles, please mark the headings as follows: Level 1        =   bold italics, 1 line space before, section number flush left. Text immediately below. Level 2         =   italics, 1 line space before, section number flush left. Text immediately below. Level 3ff       =   italics, 1 line space before, section number flush left. Heading ends with a full stop, with the text following on the same line.

Numbering should be in arabic numerals; no italics; no dot after the last number, except for level 1 headings.

Quotations : In the main text quotations should be given in double quotation marks. Quotations longer than 3 lines should be indented left and right, without quotations marks and with the appropriate reference to the source. They should be set off from the main text by a line of space above and below.

Listings : Should not be indented. If numbered, please number as follows: 1. ..................... or a. ....................... 2. ..................... or b. ....................... Listings that run on with the main text can be numbered in parentheses: (1).............., (2)............., etc.

Examples and glosses Examples should be numbered with Arabic numerals (1,2,3, etc.) in parentheses. Examples in languages other than the language in which your contribution is written should be in italics with an approximate translation. Between the original and the translation, glosses can be added. This interlinear gloss gets no punctuation and no highlighting. For the abbreviations in the interlinear gloss, CAPS or small caps can be used, which will be converted to small caps by our typesetters in final formatting. Please note that lines 1 and 2 are lined up through the use of spaces: it is essential that the number of elements in lines 1 and 2 match. If two words in the example correspond to one word in the gloss use a full stop to glue the two together (2a). Morphemes are seperated by hyphens (1, 2b). Every next level in the example gets one indent/tab.

              (1)         Kare wa    besutoseraa  o          takusan kaite-iru.                            he     TOP best-seller     ACC    many     write-PERF                           “He has written many best-sellers.’”                              

              (2)         a.           Jan houdt.van Marie.                                          Jan loves         Marie                                          “Jan loves Marie.”                             b .           Ed en   Floor   gaan samen-wonen.                                           Ed and Floor   go      together-live.INF                                          “Ed and Floor are going to live together.”

Notes should be kept to a minimum and should be submitted as numbered endnotes. ***Note: footnote indicators in the text should appear at the end of sentences and follow punctuation marks.

It is essential that the references are formatted to the specifications given in these guidelines, as these cannot be formatted automatically. Please use the reference style as described in The APA Publication Manual (6th ed.).

References in the text: These should be as precise as possible, giving page references where necessary; for example (Fillmore 1990; Clahsen 1991: 252-253) or, as in Brown et al. (1991: 252). All references in the text should appear in the references section.

References section: References should be listed first alphabetically and then chronologically. The section should include all (and only!) references that are actually mentioned in the text.

Examples Book: Görlach, M. (2003). English words abroad . Amsterdam: John Benjamins. Spear,  N. E., & Miller, R. R. (Eds.). (1981). Information processing in animals: Memory  mechanisms . Hillsdale, NJ: Lawrence Erlbaum.

Article (in book): Adams, C. A., & Dickinson, A. (1981). Actions and habits: Variation in associative representation during instrumental learning. In N. E. Spear & R. R. Miller (Eds.),  Information processing in animals: Memory mechanisms (pp. 143-186). Hillsdale, NJ: Erlbaum.

Article (in journal): Claes, J., & Ortiz López, L. A. (2011). Restricciones pragmáticas y sociales en la expresión de futuridad en el español de Puerto Rico [Pragmatic and social restrictions in the expression of the future in Puerto Rican Spanish]. Spanish in Context, 8 , 50–72.

Rayson, P., Leech, G. N., & Hodges, M. (1997). Social differentiation in the use of English vocabulary: Some analyses of the conversational component of the British National Corpus. International Journal of Corpus Linguistics, 2 (1), 120–132. 

Additional Style Guidance

Please use in-text citations, numbered endnotes, and works cited. 1.  Please do not justify the right margin of your manuscript or the electronic version on disk.  Leave a ragged right margin.

2.  Please double space everything, including quotations and footnotes.

3.  Please use American spellings and punctuation, including

  • spellings in -ize, -or, etc.
  • punctuation that includes a comma before and or or in a series of 3 items (e.g. lexis, morphology, and syntax)
  • commas to set off any preceding dependent clause of a complex sentence or to divide a compound sentence
  • double quotes to enclose a quotation and single quotes to indicate a quote within a quote;
  • end quotes after punctuation (i.e., “to be done.”)
  • comma after i.e. and e.g.
  • do not punctuate lists

4.  Section headers, if used, should simply be phrases with no numbers. Please restrict headers to three or four per essay.  They may be italicized. 5.  Miscellaneous

  • indicate a new paragraph with a single tab
  • set off any introductory phrase of five words or more with a comma, e.g. “Toward the end of World War II,...”
  • dates should be of the form “15 December 1998”
  • decades should be of the form “the 1980s”
  • spell out centuries, e.g., “eighteenth century”
  • use “and” in place of “&”, and “see” in place of “cf.”
  • use minimal capitalization, e.g., “translation studies”, “the Roman Catholic church”;
  • use minimal hyphenization, e.g., “postcolonial”
  • possessives of names ending in “s” should take the form of “Yeats's”
  • please avoid inappropriately gendered language, finding locutions as well that avoid awkward forms like “his/her” whenever possible
  • represent dashes as two hyphens, no spaces, e.g., “despite the difficulty--however great.”

Appendixes should follow the References section.

Author’s Submission Checklist

When submitting the revised version of your accepted manuscript, in addition to following the guidelines above, please be sure that you also include:

  • a one-paragraph abstract of your article
  • a list of approximately five keywords to aid in searching and indexing
  • a short (2-3 sentence) professional profile, including key publications
  • a mailing address

Proofing procedure

The first author of a contribution will receive a PDF of first proofs of the article for correction via email and will be requested to return the corrections to the journal editor within 7 days of receipt. Acrobat Reader can be downloaded for free from www.adobe.com which will enable you to read and print the file. Please limit corrections to the essential. It is at the publisher’s discretion not to implement substantial textual changes or to charge the author. If it is absolutely necessary to change larger chunks of text (i.e. more than just a few words), it is best to submit the changes on disk (with identical hard copy).

Please contact the journal editor if you cannot handle proofs for your article in electronic format (i.e., receive the proofs as a PDF-attachment at your email address).

Communication Studies

Linguistics, main bic subject, main bisac subject.

HYPOTHESIS AND THEORY article

Beyond english: considering language and culture in psychological text analysis.

\r\nDalibor Ku
era*

  • 1 Department of Psychology, Faculty of Education, University of South Bohemia in České Budějovice, České Budějovice, Czechia
  • 2 Department of Psychology, College of Science, University of Arizona, Tucson, AZ, United States

The paper discusses the role of language and culture in the context of quantitative text analysis in psychological research. It reviews current automatic text analysis methods and approaches from the perspective of the unique challenges that can arise when going beyond the default English language. Special attention is paid to closed-vocabulary approaches and related methods (and Linguistic Inquiry and Word Count in particular), both from the perspective of cross-cultural research where the analytic process inherently consists of comparing phenomena across cultures and languages and the perspective of generalizability beyond the language and the cultural focus of the original investigation. We highlight the need for a more universal and flexible theoretical and methodological grounding of current research, which includes the linguistic, cultural, and situational specifics of communication, and we provide suggestions for procedures that can be implemented in future studies and facilitate psychological text analysis across languages and cultures.

Introduction

The use of computerized text analysis as a method for obtaining information about psychological processes is usually dated to the 1960s, when the General Inquirer program was introduced ( Stone et al., 1962 ). Since then, this field has advanced and flourished in ways that were difficult to foresee at the time. The original (word-count) approaches have been enhanced and optimized in terms of the scope and complexity of their dictionaries and methods ( Eichstaedt et al., 2020 ), and the capacity of computers has arrived at processing very large amounts of data in no time. At the same time, extensive digital documentation and sharing, related to the growth of the information society ( Duff, 2000 ; Fuller, 2005 ), have provided almost unlimited input for text analysis.

Over the last decade, Natural Language Processing (NLP) methods have effectively become an established and attractive go-to method for psychological science ( Althoff et al., 2016 ; Pradhan et al., 2020 ). At present, they are developed mainly as automated systems that can understand and process texts in natural language, e.g., for conversational agents, sentiment analysis, or machine translation ( Amini et al., 2019 ). The new techniques, employing methods of artificial intelligence, classical machine learning (ML), and deep learning methods ( Magnini et al., 2020 ) are gradually displacing original approaches, with their eventual dominance in the field being a safe prediction ( Johannßen and Biemann, 2018 ; Eichstaedt et al., 2020 ; Goldberg et al., 2020 ).

By implication, the field can currently be thought of as being in a transitional phase—although most cited studies in psychology are based on foundations laid with conventional computational techniques (e.g., word counting), their share is gradually decreasing in favor of more complex techniques (e.g., ML processing). This phase is crucial in many ways, not only for the (re)evaluation of existing research backgrounds and evidence but also for the development and optimization of next-generation psychological text analysis methods.

The goal of this article is to provide a critical review of the approaches, methodology, and interpretation of traditional closed-vocabulary text analysis from the specific perspective of multicultural and multilingual research. Attention is paid to three fundamental challenges: (1) the specifics of language and culture, (2) the levels of language analysis in question and the terminology used, and (3) the context of the use of specific tools and methods. The article ends with a discussion of possible adjustments and extensions to methods and outlines further perspectives and desiderata for conducting cross-language research in psychology.

Challenges in Cross-Language Psychological Text Analysis

Over the last two decades, research on psychological aspects of natural word use ( Pennebaker et al., 2003 ; Ramírez-Esparza et al., 2008 ; Harley, 2013 ) has provided an impressive bedrock of scientific findings. Most of this research has been carried out using closed-vocabulary approaches, methods based on assigning words within a target text document to categories of a predefined word dictionary ( Eichstaedt et al., 2020 ). Semantic and grammatical features of word use have been identified as psychological markers of personal speaker characteristics, for example, gender and age ( Biber, 1991 ; Mehl and Pennebaker, 2003 ; Newman et al., 2008 ), personality characteristics ( Tausczik and Pennebaker, 2010 ; Yarkoni, 2010 ; Gill and Oberlander, 2019 ), social characteristics ( Berry et al., 1997 ; Avolio and Gardner, 2005 ; Dino et al., 2009 ; Kacewicz et al., 2014 ), emotions ( Brewer and Gardner, 1996 ; Pennebaker and Lay, 2002 ; Newman et al., 2008 ), and health ( Ramírez-Esparza et al., 2008 ; Demjén, 2014 ). The research has so far mostly been conducted within an explanation framework, but is now also increasingly used for prediction purposes ( Yarkoni and Westfall, 2017 ; Johannßen and Biemann, 2018 ).

The large number of existing studies speaks to the high relevance of this research, both in terms of establishing consensus between studies and in revealing relationships with other variables as support for concurrent validity with the results of established measures. However, recent studies have also raised important questions about the generalizability of existing findings beyond the original context of investigation, which has highlighted potential constraints on their validity in different languages and cultures ( Garimella et al., 2016 ; Basnight-Brown and Altarriba, 2018 ; Jackson et al., 2019 ; Sánchez-Rada and Iglesias, 2019 ; Chen et al., 2020 ; Thompson et al., 2020 ; Dudãu and Sava, 2021 ). The results of the studies also indicate that the comparison and psychological interpretation of linguistic phenomena between different cultures and languages is subject to several fundamental challenges.

Language and Culture in Question

The first challenge concerns the choice of the language and culture in which the texts are analyzed and interpreted. Currently, the vast majority of psychological language research is based on English, which dominates contemporary science as a lingua franca ( Meneghini and Packer, 2007 ; Seidlhofer, 2011 ). The preference of research in English is understandable—English is a global language (e.g., the most used language of international communication, information technology, and on the Internet) ( Internet Users by Language, 2021 ), English is the consensual language of academic discourse and, as such, it has a broad research base ( Johnson, 2009 ). Nevertheless, the number of English native speakers (approx. 360–400 million) ( König and van der Auwera, 2002 ), is a small fraction of the world’s population. There are approximately 6,900 languages spoken today, of which 347 have more than 1 million speakers ( Bender, 2011 ).

Although it may seem that languages are rather similar to each other, in many cases they exhibit substantial phonological, morphosyntactic, and semantic structural differences. In other words, they operate with different linguistic building blocks, structures, and relations to communicate equivalent ideas ( Haspelmath, 2020 ). As an example, we can describe the variance that exists in even such a basic classification as content (lexical) vs. function (grammatical) words ( Corver and van Riemsdijk, 2001 ). Although most languages allow a relatively clear distinction between these two types, this is not the default for all languages ( Asher and van de Cruys, 2018 ). For example, in indigenous North American languages, the words “sit,” “stand,” and “lie,” considered content words in English, appear as both content and function words ( Hieber, 2020 ). Moreover, many word classes (parts of speech) are not present in some languages (e.g., adjectives are not present in Galela language) ( Rijkhoff, 2011 ). Such differences exist at all levels of language (i.e., language domains, parts of grammar) and further examples will be given below.

In addition to differences between individual languages, differences between cultures using the same language should also be mentioned. As an example, we can use English, which is currently the official language in at least 58 countries ( List of Countries Where English Is an Official Language – GLOBED, 2019 ). Not surprisingly, the use of English shows a number of variations across these cultures. The variations are most often manifested at the level of pragmatics (e.g., accentuated manifestations of egalitarianism in western Anglophone cultures compared to more pronounced patterns of respect in Asian and Polynesian Anglophone cultures) ( Thomas and Thomas, 1994 ), but also at the level of semantics—in understanding the meaning of words (e.g., the word “old” is usually more semantically related to “age” in Australian English and to the “past” in American English) ( Garimella et al., 2016 ). Other aspects also contribute to language variation, such as dialects or the specific use of English by non-native speakers ( Wolfram and Friday, 1997 ; Yano, 2006 ). Considering that languages show such variability at both intra-lingual and inter-lingual levels, and function differently in many aspects, this may raise the question of the adequacy of single-language results (or single-culture results) that are often implicitly assumed to be broadly applicable ( Wierzbicka, 2013 ).

Definition of Levels and Variables of Language Analysis

The second challenge consists of the definition of the level of language (language domain, area of linguistic analysis) we focus on, the terminology used, and the variables in question. In research on the psychology of word use, terminology is often not set in accordance with traditional taxonomy in linguistics and does not adequately reflect interlingual differences. Instead of distinguishing language levels (domains) in dimensions which are more universal and established among linguists, e.g., morphology, syntax, semantics, lexicology, etc. ( Hickey, n.d. ; Mereu, 1999 ; Kornfilt, 2020 ), the focus of the analysis is often described in eclectic ways, based on the specifics of the language in question. For example, English is a language that has a relatively poor morphology compared to other Indo-European languages ( Vannest et al., 2002 ; Milizia, 2020 ), and the level of morphology is therefore often integrated into a group of diverse variables or is replaced by other concepts. A common example is the sorting of language features into fuzzy categories such as “Linguistic Dimensions” (covering word classes and morphology), “Other Grammar” (covering word classes and both morphology and syntax), and “Psychological Processes” (covering semantics, morphology, syntax, and pragmatics together) in the LIWC2015 program ( Pennebaker et al., 2015 ) (note: this method is described in more detail below). In fact, each of these categories includes strictly linguistic dimensions (variables), only in different configurations.

Another example is the differentiation between ‘language content’ (content of communication, that is, what is communicated/told, that usually covers lexical and semantic level) and ‘language style’ (the way the content is conveyed, that is, how the author is communicating, theoretically covering all levels of analysis, including morphology) ( Ireland and Pennebaker, 2010 ). The assumption that language content and style can be unambiguously distinguished at the level of individual variables is questionable, since the definition of words as “content” (e.g., nouns, verbs) or “stylistic” (e.g., pronouns and prepositions) varies considerably between languages ( Corver and van Riemsdijk, 2001 ; Asher and van de Cruys, 2018 ; Hieber, 2020 ). Even the most general distinction between function words and content words in one language captures rather a continuum, where prototypical function words and content words appear at opposite ends ( Osborne and Gerdes, 2019 ). In summary, although these conceptual or effectively metaphorical distinctions have proven theoretically generative and practically useful, they can significantly limit the possibilities of cross-language comparison.

The unclear taxonomy and exclusive, domain-specific terminological definition bring with them complications both at the level of interdisciplinary cooperation (e.g., among psychologists and linguists) and at the level of intercultural research ( Sonneveld and Loening, 1993 ). For languages that are relatively close in their structure, the discrepancy in classification may not be pronounced, but when distant languages are studied and compared, substantial differences can arise. The taxonomy of words and their functions is non-trivially language-specific, with different languages providing different classifications of language content and style ( Nivre et al., 2016 ; Kirov et al., 2020 ). In some languages, the same grammatical relationship is expressed morphologically, in others through function words, while some languages do not mark this information at all (e.g., in grammatical tense or definiteness) ( Osborne and Gerdes, 2019 ; Universal Dependencies: Syntax, 2021 ).

For example, many locatives are marked by prepositions in English (e.g., “in,” “by,” “to,” “from”), while in Finnish they appear as morphological case-inflections (e.g., “-ssA,” “-llA,” “-lle,” “-stA,” “-ltA”). Furthermore, possessives and adverbials can be marked morphologically in Finnish (e.g., “-ni”—“my,” “-si”—“your”), but in English they appear as separate words, thus a word form like “auto-i/ssa/ni/kin” (“also in my cars”) with stem and four subsequent suffixes would need four separate words in English ( Vannest et al., 2002 ). The Czech language provides another example of the interconnection between language content and style. It also works with a wide range of grammatical suffixes that change paradigmatic and grammatical classification, e.g., the word “uč” (“teach!”) with suffixes “-it” (“to teach”) “-el” (“teacher”) “-ova/á” (“of teacher”) “-ní” (“teaching”) “-čko” (“little teaching”), where each of the suffixes can changes the inflection and/or semantic nature of the word ( Rusínová, 2020 ). Therefore, a text analysis approach that counts and processes such linguistic units as stand-alone words ( Pennebaker and King, 1999 ; Pennebaker et al., 2014 ) is inherently limited and potentially biased.

Approaches and Methods in Question

The third challenge concerns specifics around the commonly employed text analytic approaches and methods. Many methods were primarily designed for the processing of a specific language, or even a specific type of communication (i.e., genre or register), and their use in cross-language research can therefore result in methodological and interpretive difficulties. In this regard, the current approaches to quantitative text analysis, based on lexical and semantic levels of analysis (treating words/tokens as lexical units within a certain semantic field) ( Cruse et al., 1986 ), can be divided into two main groups—closed-vocabulary approaches and open-vocabulary approaches ( Schwartz et al., 2013b ). Closed-vocabulary approaches operate from “top down” and assign words from a target text to psychologic categories within a specific and fixed dictionary (e.g., a dictionary of emotional words that covers categories of positive and negative emotion categories). This procedure is also referred to as the word-count approach ( Schwartz et al., 2013a ; Iliev et al., 2015 ; Kennedy et al., 2021 ). The result of the analysis is usually the (normalized) frequency within which references to these categories occur in a given text ( Eichstaedt et al., 2020 ).

Compared to that, open-vocabulary approaches operate from “bottom-up” (data-driven), that is, based on language (text) as such. Algorithms identify related clusters of units (lexical units or elements, for example, punctuation) that naturally occur (and co-occur) within a large set of texts and find lexical and semantic patterns that appear (and appear together) in the data ( Park et al., 2015 ; McAuliffe et al., 2020 ). Both approaches have their pros and cons; as stated by Eichstaedt et al., “Closed-vocabulary approaches can be rigid, while open-vocabulary approaches can be sensitive to idiosyncrasies of the dataset and the modeler’s choices about parameters. Closed-vocabulary approaches are more reproducible but inflexible, where open approaches are more flexible but can vary across datasets” (p. 77) ( Eichstaedt et al., 2020 ). Given the historical dominance of word-count approaches, the following section focuses in detail on closed-vocabulary analysis.

Closed-Vocabulary Approaches in Cross-Cultural Research

In terms of the number of published studies, closed-vocabulary approaches still dominate by far the field of psychology of word use. There are many reasons for their preference, for example, their implementation exacts little technical demands (training of the AI, development of algorithms, etc.), they allow relatively uncomplicated interpretation of the results, and they also do not require large datasets to perform the analysis ( Eichstaedt et al., 2020 ; Sharir et al., 2020 ). Over the last six decades, a number of tools have been developed, e.g., General Inquirer ( Stone et al., 1962 ), DICTION ( Hart, 2001 ), Linguistic Inquiry and Word Count (LIWC) ( Pennebaker et al., 2015 ), Affective Norms for English Words (ANEW) ( Bradley and Lang, 1999 ), SentiStrength ( Thelwall et al., 2010 ), SentiWordNet ( Baccianella et al., 2010 ), OpinionFinder ( Wilson et al., 2005 ), Regressive Imagery Dictionary ( Martindale, 1973 ), TAS/C ( Mergenthaler and Bucci, 1999 ), Gottschalk-Gleser Scales ( Gottschalk et al., 1969 ), or Psychiatric Content Analysis and Diagnosis (PCAD) ( Gottschalk, 2000 ).

Most of these methods are primarily focused on the level of lexical semantics, that is, on searching for words with specific semantic loading. The analyzed text is usually compared with a predefined dictionary that contains words that represent a concept (e.g., religion words) or a psychological state (e.g., positive emotion words). For example, the concept of ‘satisfaction’ in DICTION is represented by words such as “cheerful,” “smile,” or “celebrating” ( Hart and Carroll, 2011 ). Leaving aside the question of the validity of the semantic categories in the dictionary itself (cf. Garten et al., 2018 ), there are several issues that closed vocabulary analysis has to deal with. A common problem is the interpretation of lexical ambiguity and the meaning of words in different contexts ( Hogenraad, 2018 ). A typical example in English are contronyms or polysemous words such as “fine” (signifying both pleasant and a penalty), “mean” (signifying both bad and average), and “crazy” (signifying both excitement and mental illness). The risk of misinterpretation (misclassification) can be reduced by, e.g., removing or replacing ambiguous words from the dictionary ( Schwartz et al., 2013a ). However, such a procedure almost necessarily also reduces the sensitivity of the semantic category, and thus the precision of the analysis.

Level of Lexical Semantics in Cross-Language Adaptation

If we focus on cross-language adaptation of closed-vocabulary methods, it should be emphasized that these tools are naturally based on the specifics of the source (original) language for which they were developed, most often English [see Mehl (2006) ]. Therefore, adapting such dictionaries to other languages is often a complicated and time-consuming process that faces a series of additional challenges ( Bjekić et al., 2014 ; Dudãu and Sava, 2020 ; Boot, 2021 ). First, the methods are most often based on the original cultural and linguistic structure rather than the target culture or language, that is on the imposed-etic approach ( Berry et al., 1997 ). This strategy can lead, among others, to the risk of reductionism or misinterpretation of results, for example, when constructs (variables/categories) do not exist, are not equivalent, or function differently in the original and target language ( Church and Katigbak, 1989 ). Languages often have unique words that are difficult to express in other languages (e.g., words like “toska” in Russian, “jamani” in Swahili or “saudade” in Portuguese). Furthermore, even for words that seem easy to translate, their meaning may shift, e.g., in English, the word “anger” is mainly related to wrath, irateness or rage, while in the Nakh-Daghestanian languages, it is closer to envy and in the Austronesian languages more closely associated with pride ( Jackson et al., 2019 ).

Let us add that semantic changes are not a matter of cross-language comparison only, but they can also occur naturally within one language, such as in different historical stages of a language ( Vanhove, 2008 ; Riemer, 2016 ; Garten et al., 2018 ).

Second, the possibility of estimating possible shortcomings of dictionary adaptation can be problematic, since the degree of equivalence varies not only across language features (some words are more cross-linguistically and cross-culturally comparable than others) ( Biber, 2014 ), but also across different communication contexts ( Daems et al., 2013 ; Biber and Conrad, 2019 ; Dudãu and Sava, 2020 ). For example, the meaning and use of the English word “hump” vary both between English speaking cultures and between situational contexts (e.g., in British English it can refer to an emotional state, in American English it can refer to a vigorous effort, depending on the context in which it can be perceived as vulgar). In some languages, the influence of the context is crucial for the word interpretation and classification, such as in Czech, where sociolinguistic situation (inter-lingual variation) borders on diglossia ( Bermel, 2014 ). Thus, we can assume that dictionaries validated only in a certain communication context (e.g., academic essays) will not be sufficiently effective in another context (e.g., informal conversations).

The topic of comparability of language variables (words, units, features) across languages is discussed in a number of studies. Although many of them have revealed a high degree of similarity in the results of cross-language analysis ( Ramírez-Esparza et al., 2008 ; Windsor et al., 2019 ; Vivas et al., 2020 ), there is increasing evidence pointing to significant differences in lexical and semantic functioning across more distant languages. In the study by Thompson et al. (2020) , published in Nature Human Behavior , the authors analyzed semantic alignment (neighborhood) for 1,010 meanings in 41 languages using distributed semantic vectors derived from multilingual natural language corpora. While some words within semantic domains with a high internal structure were more closely aligned across languages—especially quantity, time, and kinship (e.g., “four,” “day,” and “son”), words denoting basic actions, motion, emotions and values (e.g., “blow,” “move,” and “praise”) aligned much less closely. In terms of semantic alignment by parts of speech (word classes), the highest alignment was found in numerals, while other parts of speech were much less aligned (e.g., prepositions were the least aligned). Thus, this study critically questions the idea of widely comparable word meanings across languages, at least from a cross-cultural universalist perspective ( Kim et al., 2000 ).

Another study, published in Science , examined nearly 2,500 languages to determine the degree of similarity in linguistic networks of 24 emotion terms ( Jackson et al., 2019 ). The study also revealed a large variability in the meaning of emotion words across cultures. For example, some Austronesian languages colexifies the concepts of “pity” and “love,” which may index a more positive conceptualization of “pity” compared to other languages. Another example concerns the connotation of “fear,” which is more associated with “grief” and “regret” in Tai-Kadai compared to other languages. As the authors show, the similarity of emotion terms could be predicted based on the geographic proximity of the languages, their hedonic valence, and the physiological arousal they evoke. Given the central role of emotion words, and more broadly sentiment analysis, in the field of language analysis, this study has clear implications for cross-language analysis, particularly when comparing distant cultures and languages.

Finally, cultural differences in language use were also documented in a study that focused only on English. Garimella et al. (2016) described the differences between Australia and the United States based on the words they used frequently in their online writings. The results indicated that there are significant differences in the way these words are used in the two cultures, reflecting cultural idiosyncrasies in word use. For example, the adjective “human” is more related to human rights in the Australian context, but more to life and love in the United States context ( Garimella et al., 2016 ). From our point of view, these studies provide important insights: although languages are similar in many ways and they certainly share universal bases, the degree of similarity varies depending on cultural and geographical specifics.

The Linguistic Inquiry and Word Count Program as an Example

So far, we have focused on the analysis on the lexical semantics level only—this level is also common to all closed vocabulary approaches mentioned above. However, one of the methods, the LIWC program, is exceptional in this respect—besides traditional semantic categories (social words, emotion words, etc.), it provides an additional analysis of morphology and syntax features. Therefore, LIWC therefore serves well to illustrate the potentials and pitfalls of cross-linguistic adaptation of the closed vocabulary method in the context of multiple language levels (domains).

Linguistic inquiry and word count ( Pennebaker et al., 2015 ) is currently the most widely used text analysis method in the social sciences. At the time of writing this article, 781 records were available on the Web of Science that contained “LIWC” or “Linguistic Inquiry and Word Count” as the topic, and more than twenty thousand records are listed on Google Scholar. In its current version, LIWC2015, the program offers an intuitive user interface and provides a simple and clear output of the results ( Pennebaker et al., 2015 ), including a range of comparison possibilities ( Chen et al., 2020 ). LIWC dictionaries have been translated and adapted into multiple languages, including Spanish ( Ramírez-Esparza et al., 2007 ), French ( Piolat et al., 2011 ), German ( Wolf et al., 2008 ; Meier et al., 2019 ), Dutch ( Boot et al., 2017 ; Van Wissen and Boot, 2017 ), Brazilian-Portuguese ( Balage Filho et al., 2013 ; Carvalho et al., 2019 ), Chinese ( Huang et al., 2012 ), Serbian ( Bjekić et al., 2014 ), Italian ( Agosti and Rellini, 2007 ), Russian ( Kailer and Chung, 2007 ), Arabic ( Hayeri, 2014 ), Japanese ( Shibata et al., 2016 ), and Romanian ( Dudãu and Sava, 2020 ).

English LIWC2015 works with approximately 90 features grouped into 4 domains: “Summary Language Variables” (general text descriptors and lexical variables, including one syntactic variable “words per sentence”), “Linguistic Dimensions” (containing summary variables, word classes variables, and morphological variables, e.g., “total function words “, “articles,” “1st person singular,” and “negations”), “Other Grammar” (containing word classes variables, and both morphological and syntactic variables, “numbers,” “comparisons,” and “interrogatives”), and “Psychological Processes” (containing semantic variables and other variables, e.g., “sadness,” “non-fluencies,” and “causation”) ( Pennebaker et al., 2015 ). In terms of the analytic procedure, LIWC operates on relatively simple principles. LIWC uses its own dictionary to simply identify and label the corresponding words in the analyzed text— via word-count. Pre-processing in LIWC includes only basic segmentation and requires additional manual tagging (e.g., for specific ambiguous filler words, e.g., “well,” “like,” or non-fluencies, e.g., “you know”). More advanced NLP procedures, on the other hand, use pre-trained models and perform a sequence of “cleaning” processes in such tasks (e.g., Rayson, 2009 ; Manning et al., 2014 ), e.g., part of speech disambiguation and tagging, lemmatization, or parsing ( Straka and Straková, 2017 ).

Several strategies have been used to adapt the LIWC dictionary to other languages ( Boot, 2021 ). These include the supervised translation of the English dictionary word by word ( Bjekić et al., 2014 ; Dudãu and Sava, 2020 ), the use of the existing word corpora and their assignment to corresponding LIWC categories ( Andrei, 2014 ) or as an enrichment of LIWC categories ( Gao et al., 2013 ; Meier et al., 2019 ), the use of dictionaries in closely related languages ( Massó et al., 2013 ), the modification of the older version of the dictionary ( Zijlstra et al., 2004 ), or adapting the original dictionary via machine translation ( Van Wissen and Boot, 2017 ). The various LIWC languages differ significantly in the number of words contained in the dictionary. For example, the Romanian LIWC dictionary (Ro-LIWC2015) contains 47,825 entries compared to the English LIWC2015 dictionary with 6,549 entries (including words, word stems, and emoticons; cf. LIWC2007 contains 4,500 words, and LIWC2001 contains 2,300 words). The average proportion of words identified (labeled) by LIWC also varies considerably across the different LIWC language dictionaries, for example 87% in English (LIWC2015; cf. 82% in LIWC2007), 88% in German (DE-LIWC2015; cf. 70% in LIWC2001), 70% in Dutch, 54% in French, 66% in Spanish, 70% in Serbian, and 67% in Romanian ( Bjekić et al., 2014 ; Dudãu and Sava, 2020 ), speaking to the fact that the LIWC approach likely yields differential sensitivity across different languages.

Analysis of Non-semantic Levels of Language

The translation and adaptation process faces most of the issues described above. Here, however, the analysis deals also with additional challenges, connected to level of morphology and syntax of the target languages, for example the pronoun-drop phenomenon (in some languages, users very frequently omit pronouns, particularly in their subject positions; e.g., “tengo hambre” in Spanish dropping the first-person singular pronoun “yo”) ( Świątek, 2012 ), grammatical classification (e.g., pronominal adverbs in Dutch, that combine pronouns/adverbs with prepositions—“we doken erin” which replaces “we doken in het”—“we dived into it”), grammatical restrictions (some linguistic features are restricted to particular languages, see below), with case sensitivity problems (LIWC is not case-sensitive which makes it difficult to process certain words, e.g., the German word “Sie” which, if capitalized, serves as formal second person singular or plural pronoun and, when not capitalized, serves as third person plural pronoun), and the above mentioned ambiguity (including, if the capitalized word appears at the beginning of a sentence) ( Boot, 2021 ).

Although some shortcomings of the dictionary translation approach can be partially overcome (e.g., by removing words from the dictionary, adding new words and phrases, or with data pre-processing), they still increase the risk of reduced sensitivity and validity, especially in its reliability and comparability to the original method. As already mentioned, this applies particularly to languages with a grammatical structure more distant from English. For example, due to the grammatical structure of Serbian (a Slavic language), the category of verbs had to be substantially modified, and the category of articles had to be removed completely ( Bjekić et al., 2014 ). Many adjustments were also made in the Romanian adaptation, for example in verb tense, grammatical gender, or diacritics processing ( Dudãu and Sava, 2020 ). To sum up, every translation of the LIWC dictionary involves many decisions about which entries (words or categories) should be kept, dropped, or added, and each decision is necessarily a trade-off between computational feasibility and linguistic accuracy ( Dudãu and Sava, 2021 ).

Cross-Language Evaluation of Linguistic Inquiry and Word Count

The extent to which language specifics and LIWC adjustments affect the quality of adaptation is difficult to evaluate, as the studies differ in many aspects. Some studies do not report psychometric validation information for their dictionaries (e.g., Arabic, Turkish, or Russian), while others provide only indirect evidence ( Balage Filho et al., 2013 ). In several studies, equivalence estimates are presented as a general indicator of the quality of adaptation. Equivalence is usually estimated via correlation coefficients between the adapted version of LIWC and the English original. If we focus on four major studies, the authors report an average correlation of adapted LIWC and English LIWC as r = 0.67 for German based on N = 5,544/6,463 texts in German/English (Europarl corpora and transcriptions of TED Talks transcriptions), r = 0.65 for Spanish ( N = 83 texts in Spanish/English; various Internet sources), r = 0.65 for Serbian ( N = 141 texts in Serbian/English; scientific abstracts, newspapers and movie subtitles), and r = 0.52 for Romanian ( N = 35 books of contemporary literature in Romanian/English) ( Ramírez-Esparza et al., 2007 ; Bjekić et al., 2014 ; Meier et al., 2019 ; Dudãu and Sava, 2020 ).

Although the average values of the correlations can be considered satisfactory, upon closer inspection, they vary widely between categories and levels of analysis, especially in morphology and semantics. For example, in the Romanian LIWC, most correlations of non-semantic categories are non-significant (11 of 18 categories). Significant results were found in the category “Pronouns” in the first person (singular 0.93, plural 0.92) and in the third person singular (0.66, plural non-significant), in the category “Other Function Words” in conjunctions (0.37) and negations (0.53) and in the category “Other Grammar” in interrogatives (0.58) and quantifiers (0.66) ( Dudãu and Sava, 2020 ). Considering these results and the average proportion of total words identified in the Romanian LIWC (only 67% words were labeled), we must conclude that the Romanian LIWC appears not effective enough for the comparable analysis of non-semantic (grammatic) categories, even though its dictionary is seven times bigger than the English original (Romanian: 47,825 entries; English: 6,549 entries; Dudãu and Sava, 2020 ).

Another issue concerns the specificity of text samples on which validity and equivalence tests were performed. In this sense, the communication context (text type, genre, register) is an important factor that can produce substantial variation both in the frequency of language features and in the associations with other variables, especially psychological ones ( Pennebaker et al., 2007 ; Daems et al., 2013 ; Haider and Palmer, 2017 ; Biber and Conrad, 2019 ; Kučera et al., 2020 ; Dudãu and Sava, 2021 ). Differences in the sensitivity of LIWC for detecting psychological markers in different types of text (English only), were shown in the meta-analysis of Chen et al. (2020) , in which, for example, the strength of the relationship between extraversion and positive emotion words varied significantly and substantially across communication contexts (e.g., asynchronous/synchronous and public/private communication). Thus, if only one type of communication is used (e.g., only written language), it is difficult to estimate to what extent the translated dictionary has comparable validity for, for example, spoken communication. Moreover, it is possible to assume that the language variation is related to multiple factors, not only to the type of text, but also to, for example, sociodemographic characteristics of speakers ( Stuart-Smith and Timmins, 2010 ), as well as to discourse domain and language itself ( Biber, 2014 ).

The above-mentioned challenges have implications not only for the adaptation of closed-vocabulary methods to other languages, but for the field of psychology of word use more broadly. Due to the predominant interest of research in the English language, psychological language markers are often implicitly presented in studies as relatively universal, generalizable at least to English-speaking cultures ( Chung and Pennebaker, 2018 ). In many classical studies, for example, frequent use of first-person singular pronouns has emerged as a marker of negative emotionality ( Pennebaker and King, 1999 ; Pennebaker et al., 2003 ; Oberlander and Gill, 2006 ; Gill et al., 2009 ; Yarkoni, 2010 ; Qiu et al., 2012 ). However, subsequent research in other languages and on other samples relativizes this relationship ( Mehl et al., 2012 ; Bjekić et al., 2014 ; Holtzman et al., 2019 ; Kučera et al., 2020 , 2021 ). Given the lack of cross-language and cross-cultural studies, the original assumption of generalizability is understandable. However, considering recent studies, the previous conjectures need to be corrected for regarding the culture, language, and communication contexts and samples in which the relationships emerged. If the different functioning of words in other languages and cultures is not sufficiently described, many generalizations may be biased or misrepresented as a result.

Dealing With Closed-Vocabulary Cross-Language Analysis

Although the issues raised above may raise pessimism regarding the possibilities of closed-vocabulary approaches in cross-language research, we believe that most challenges can (and need to be) overcome, at least to some extent. Closed-vocabulary approaches offer, in contrast to open-vocabulary approaches, several advantages that are important for psychological research. The categories they work with can be intuitively labeled and (and facilitate interpretation, explanation, testing, and accumulation and transfer of results (e.g., into other languages and contexts) ( Kennedy et al., 2021 ). Even if traditional methods are replaced by new technologies (e.g., AI), the demand for interpretations of phenomena based on intuitive categories (e.g., representing variables using established psychological concepts) is bound to survive. In the rest of the article, we therefore focus on suggestions that support the effective use of closed-vocabulary approaches in multilingual and multicultural setting.

Dealing With Language and Culture

The first challenge we discussed was the language and culture on which the analysis is based and the degree of its similarity to other languages and cultures. To build on the previous arguments, text analysis methods likely provide more different results the further apart studied languages and cultures are, not only because of the methodological differences in analysis, but also because of the specifics of the languages and cultures themselves. As a parallel, we reference the issues concerning the use of Big Five personality questionnaires across cultures (the most widely used method for assessing personality characteristics), which outside of western, educated, industrialized, rich and democratic (WEIRD) populations shows serious limitations and low validity for measuring the domain of basic personality traits ( Laajaj et al., 2019 ). In the same way, striving for better explanations of cross-linguistic variation requires employing the power of cross-cultural comparisons to describe the variation and similarity ( Barrett, 2020 )—the methodology must be linked to more principled sampling, both at the level of speakers (e.g., representative sample of speakers in a given culture or at least a sample corrected for imbalances) and texts (e.g., to acquire the texts with regard to their ability to be representative for selected communication contexts).

Since the cross-language comparison based on texts from the entire communication spectrum would be difficult to implement, it is necessary to choose specific types of communication (i.e., registers, and genres) to be analyzed. Leaving aside their ease of availability to the researcher, the focus should be on types of text that show a certain degree of cross-language universality. In this regard, existing cross-linguistic studies on register variation can provide important information in this regard. For example, Biber’s research finds two language dimensions (i.e., constellations of linguistic features that typically co-occur in texts) that could be considered relatively (although not absolutely) universal: (1) “clausal/oral” discourse vs. “phrasal/literate” discourse, and (2) “narrative” vs. “non-narrative” discourse ( Biber, 2014 ). The first dimension linguistically comprises typical grammatical features (e.g., verb and pronoun classes) and is based functionally on a distinction between a personal/involved focus and informational focus (e.g., private speech vs. academic writing as prototypic genres). The second, narrative dimension, consists of different sets of features (e.g., human nouns and past tense verbs), and typically appears in fictional stories, personal narratives, or folk tales. These general patterns have emerged from different studies of languages other than English, for example, Spanish, Brazilian Portuguese, Nukulaelae Tuvaluan, Korean, Somali, Taiwanese, Czech, and Dagbani ( Biber, 2014 ).

From the point of view of cross-language comparison, it is therefore recommended to choose text types that are at least somewhat comparable on these two dimensions to ensure maximum (in the sense of as much as reasonably possible) comparability. If the selection of texts cannot be made by dimensions defined ex ante (e.g., if the texts have already been collected), it is also possible to subject the texts to ex post dimensional analysis via multi-dimensional analysis (MDA), an approach that identifies co-occurrence patterns of linguistic features based on the factor analysis ( Biber, 1991 ). Through MDA, it is possible to describe different texts in terms of their similarity in dimensional structure. However, MDA is currently only available for a limited number of languages (in addition to the languages listed above for Scottish Gaelic and written Chinese) ( Sardinha and Pinto, 2019 ).

Dealing With Levels of Analysis and Language Variables

The second challenge concerns the terminology and language level (domain) that is the subject of the analysis. Since the definition of language variables based on the specifics of one language only is problematic, it is necessary to work with variables that have common characteristics and to categorize them in a more clearly defined system. The issue of universal classification has been addressed in a number of studies, both theoretically and practically ( Hasselgård, 2013 ). If we are to build on newer approaches, two of the available linguistic frameworks can serve as an example to follow, the Universal Dependencies (UD) and the Universal Morphology (UniMorph) projects ( Nivre et al., 2016 ; McCarthy et al., 2020 ). Both frameworks focus on the annotation of human language and connect many fields of contemporary linguistics ( Osborne and Gerdes, 2019 ; de Marneffe et al., 2021 ). In both frameworks, morphology (including part of speech) and syntax are considered the most principal non-semantic levels of language analysis in the taxonomies.

Universal Dependencies 1 is a framework for annotation of grammar across different human languages, currently available for 122 languages with 33 more in preparation ( Universal Dependencies, 2021 ). Morphological variables of UD include, for example, the categories of part of speech and lexical and inflectional features (e.g., pronominal type and degree of comparison), and syntactic variables include cover dependency relations between words (relations between a syntactic head and a subordinate element, e.g., multiple determiners attached to the head noun).

The UniMorph project 2 has similar goals as UD and provides normalized morphological paradigms for diverse world languages, especially low-resource languages with inflectional morphology. The schema of UniMorph comprises 23 dimensions of meaning (e.g., person, number, tense, and case) and over 212 features (for the dimension of case, e.g., ablative, absolutive, accusative, etc.) ( Sylak-Glassman, 2016 ; McCarthy et al., 2020 ).

If we consider Universal Dependencies and the Universal Morphology frameworks from the perspective of cross-language research, i.e., when comparing multiple languages analyses, a comment needs to be added to the number and applicability of linguistic variables. Since the set of linguistic features (categories, dimensions) we can work with is entirely dependent on properties of languages in question, it is necessary to identify features that are shared between these languages—i.e., identically labeled in UD or UniMorph. For example, if we compare the results of UD text analysis in English and Spanish, we can only work with 13 English features, which are shared with Spanish (e.g., degree, gender, person, polarity; see English ParTUT and Spanish AnCora treebanks; Universal Dependencies, 2021 ). However, UD in Spanish offers more linguistic features (23 features in total), and we can use these “non-English” variables, e.g., in a further comparison with another language.

To sum up, the frameworks provide useful tools, and they can serve as a starting point for better classification and (re-)definition of language variables for the purposes of cross-language psychological analyses. In addition, Universal Dependencies Tools are open-source software, so they are available for free.

Dealing With Cross-Language Adaptation of Methods

The third challenge is related to current approaches to text analysis and their methods. In terms of cross-language use of semantically based closed-vocabulary approaches, research should focus primarily on identifying and covering the semantic specifics and functioning of words in different languages, not just on translating the text into the language of analysis. Studies that describe the semantic alignment of words across different languages and contexts could help here ( Garimella et al., 2016 ; Jackson et al., 2019 ; Thompson et al., 2020 ). For both semantic and morphological analysis, several procedures can be used to increase the comparability of the analyses. For example, it is possible to use statistical adjustments proposed by Dudãu and Sava—to employ multilevel analysis with language as the level 2 covariate (especially when text input is available in relatively different languages) or to perform within-language standardization to attenuate the language particularities that could affect the investigation in the multilingual setting.

For example, Brazilian Portuguese probably has linguistic particularities in the use of third-person singular (e.g., in personal pronouns and possessives with a higher degree of inflection), which can cause inconsistencies in cross-language comparisons ( Carvalho et al., 2019 ). To avoid the lack of equivalence between results of analyses in different languages, it is possible to perform within-language standardization, i.e., use the mean and standard deviation of the third person singular variable as the reference parameters for rescaling the values. As the authors state, when comparing the four LIWC language adaptations (English, Dutch, Brazilian Portuguese, and Romanian), the unadjusted calculations show little sign of cross-language equivalence compared to the situation where language specificities are considered, that is, via within-language standardization ( Dudãu and Sava, 2021 ).

Another way to reduce the difficulties of adapting closed-vocabulary methods and subsequent cross-language comparison is to use machine translation. Two basic approaches are the “translated dictionary” approach, and the “translated text” approach. The first one consists of automatic translation of entries (usually word by word) from the original dictionary (e.g., English) into the target language. This creates a new dictionary in the target language, which is used to perform analyses in this particular language (e.g., the Danish version of LIWC) ( Boot et al., 2017 ; Van Wissen and Boot, 2017 ). The second approach consists of translating the analyzed text into the language in which the original method works (e.g., English) and then in performing the analysis with the original method. This approach seems to be effective and straightforward in many ways—it makes the analysis tool accessible to languages for which it has not yet been adapted, and reduces errors associated with the translation process and adaptation of the dictionary into another language. The efficiency of MT systems (e.g., Google Translate) is proving to be very high also in terms of syntax and stylistics and recent studies show that this “translated text” approach outperforms the traditional word-by-word “translated dictionary” approach ( Windsor et al., 2019 ; Araújo et al., 2020 ; Boot, 2021 ), for example, in measures of equivalence of Dutch, German, and Spanish language analyses ( Boot, 2021 ).

Dealing With Methods Based on Machine Learning

Finally, acknowledging where the field is heading, we would like to comment on questions around new technologies in psychological text analysis more generally. The use of artificial intelligence (AI), machine learning (ML), and machine translation (MT) is already closely related to many aspects of text analysis, for example, within open-vocabulary approaches ( Eichstaedt et al., 2020 ). Undoubtedly, modern technologies offer enormous potential based on the performance and sophistication of up-to-date computational systems, but also raise fundamental questions about methods of data processing, their supervision, and interpretation of results ( Mønsted et al., 2018 ; Stachl et al., 2020 ).

The ML and MT methods allow us to expand the spectrum of observed variables and at the same time effectively predict their relationships. However, from the perspective of our paper, their disadvantage is the problematic interpretation of the analytical processes itself, i.e., the so-called black box problem ( Castelvecchi, 2016 ). For example, it is possible to train AI on a large number of texts to effectively recognize the specific characteristics of speakers (and then, e.g., allow the AI to predict them), but it is difficult to get clearer information on what procedures and variables (features) are involved in the process ( Zednik, 2019 ). AI is thus more of a promising method for predicting relationships, rather than a method that provides their explanation and deeper insight ( Yarkoni and Westfall, 2017 ).

It is not within the scope of this article to discuss all aspects of ML/MT utilization; however, we would like to focus on one issue that we consider particularly important in relation to cross-language research and the use of closed-vocabulary analysis in psychology. These are the quality and complexity of the training data, especially in the context of different languages and different types of communication.

Successful use of ML depends to a large extent on the data on which the system is trained, both in terms of quantity and quality ( Ehrlinger et al., 2019 ). Regarding the number of training texts, a general rule of thumb is that more data usually means higher effectiveness of the system ( Baeza-Yates and Liaghat, 2017 ). In terms of data quality, the situation is much less clear. In addition to routine data quality controls (e.g., cleaning dataset from irrelevant texts), the nature of texts should also be considered, especially at the level of the type of communication that is the subject of the ML training ( Smith et al., 2013 ; Modaresi et al., 2016 ; Medvedeva et al., 2017 ; Ott et al., 2018 ). For example, several studies have shown that current electronic communication is dominated by the so-called “electronic/internet discourse” (e-discourse), which takes the form of semi-speech (between speaking and writing) ( Abusa’aleek, 2015 ). This e-discourse has its own features such as unconventional spelling and combinations of visual and textual elements ( Lyddy et al., 2014 ; Pam, 2020 ).

Following this concept, we can assume that if ML is, say, trained primarily on parallel corpora of formal written communication (e.g., press releases or parliament transcripts in two or more languages), its effectiveness for processing (translating) the e-discourse or other more specific communication might be noticeably reduced, and vice versa ( Koehn and Knowles, 2017 ; Søgaard et al., 2018 ). Increased error rates for certain types of text (styles, genres, registers) have been described for systems as complex as Google Translate ( Putri and Havid, 2015 ; Afshin and Alaeddini, 2016 ; Prates et al., 2018 ). These errors mainly concern lexical/discourse errors and style errors (note: lexical errors occur when MT translates words wrongly or does not translate them, discourse errors occur when MT could not recognize the meaning of the word in its context, and style errors occur when the word is inappropriate in a given context). In the 2016 research, error rates (based on comparison with human translation) were quantified at 5.9% for lexical/discourse errors and 8% for style errors ( Afshin and Alaeddini, 2016 ). Higher sensitivity to errors was found in the translation of function words, especially adjectives and adverbs ( Putri and Havid, 2015 ). In addition to these errors, problems referred to as “machine-bias” can arise. A classic example is the case of gender preference in Google Translate, that is, when Google MT exhibited a strong tendency toward male defaults ( Prates et al., 2018 ). Although the issue was quickly handled by Google through (forced) equal representation of gender categories in translation, the underlying problem itself is not resolved that easily, since MT was probably trained on (historical) data in which the male gender is more common, which resulted in the preferred in translation. In these situations, it is therefore necessary to apply methods such as “post-editing,” i.e., the process of making corrections or amendments to automatically generated text (machine translation output) ( Temizöz, 2016 ; Gutiérrez-Artacho et al., 2019 ).

The quality of MT is constantly changing with the ever-increasing training data and the participation of new technologies (e.g., automatic transcription of oral communication). At the same time, the accumulation of data facilitates the representation of more diverse types of communication and language varieties (dialects, sociolects, etc.), which contributes to solving number of problems of traditional closed-vocabulary approaches (MT is based on authentic varieties of language, not on a priori assumptions about their functioning). However, the increase in the amount of training data is not proportional between languages—languages that are used more often in electronic communication (especially English) provide automated systems with much more data than the so-called “low-resource/resource-poor languages” ( Thuy et al., 2018 ). Although it is possible to apply procedures that link datasets of resource-sufficient and resource-poor languages ( Impana and Kallimani, 2017 ), the issue of reduced comparability cannot be overlooked ( Seki, 2021 ). The described situation is a parallel to the previously mentioned problem of disproportionate representation of certain types of communication in the ML dataset. In the application of MT in psychological research, it is therefore necessary to emphasize the need for control and documentation of the ML training process, especially when working with languages that generate fewer texts compared to the world’s most used languages, and when working with types of text that are more distant to original training data.

At the beginning of our article, we stated that we are currently in a “transitional phase of research” within the field of text analysis. After more than 60 years of research on psychological aspects of word use, new technologies and methods are entering this discipline at a rapid rate. Original programs based on simple word counting are being challenged by automated machine learning systems and large-scale “big data” analyses ( Gandomi and Haider, 2015 ) that allow for extensive cross-cultural comparisons. New technologies offer great potential, but the question is when (or whether) they will completely replace traditional techniques. It will also be important to consider to what extent the original methods can support more advanced analyses in terms of their focus, interpretation, and explanation of linguistic phenomena. In this regard, current research raises a number of questions related to the relevance of older studies, considering different language structures in different cultures and contexts of human communication ( Kim et al., 2000 ; Jackson et al., 2019 ; Thompson et al., 2020 ).

In our critical analysis here, we focused on closed-vocabulary approaches, a relatively old method of text analysis. Nevertheless, even today, its contribution needs to be appreciated and its strengths highlighted. We would like to celebrate the ground-breaking research and many quality papers that have been published in this field over the last two decades (for all, see, e.g., Pennebaker et al., 2003 ). Research in Anglophone cultures has provided many excellent tools for text analysis in English, but it has also amplified universalist tendencies to adapt target languages to default methods, instead of adapting these methods to target languages and their specifics (e.g., Bjekić et al., 2014 ; Dudãu and Sava, 2020 ). Given the richness and variety among different languages, many relationships between language and psychological variables are undoubtedly reduced this way ( Kim et al., 2000 ; Wierzbicka, 2013 ; Kučera, 2020 ).

In summary, we can state three basic considerations: (1) To further the science of the psychology of word use, it is necessary to promote close interdisciplinary cooperation, especially with the fields of linguistics, computer science, and cultural psychology. Within that, linguistics can provide a clear taxonomy of language, a background in cross-linguistic research, and useful analytic tools (e.g., MDA for dimensional text description or UD for their morpho-syntactic annotation) ( Biber, 2014 ; de Marneffe et al., 2021 ). (2) If we are looking for relationships between mind, behavior, and language use, it is not possible to overlook the specifics of different languages and cultures. Although studies conducted in English are usually more accessible to both researchers and the public (e.g., given the tools available and the amount of data), it is critical to compare the results with studies in other languages and cultures in order to evaluate the generalizability of relationships and to understand their meaning more deeply ( Kim et al., 2000 ; Wierzbicka, 2013 ). (3) In cross-language psychological research, all present-day methods can be used. However, it is necessary to consider their functionality in different contexts (e.g., define more universal variables and comprehend situational/cultural aspects of communication) ( Biber and Conrad, 2019 ; Cvrček et al., 2020 ), and critically assess their development and use. This consideration also applies to current machine learning systems, in which the possibility of methodological supervision is usually limited (in terms of control of the analysis process) and in which the fundamental condition for their effectiveness is the quality of training data ( Koehn and Knowles, 2017 ; Ott et al., 2018 ). These three points can be related to both new studies and studies already conducted, for which a review of their results could be expected.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Author Contributions

DK: conceptualization, investigation, writing—original draft, and writing—review and editing. MM: conceptualization, supervision, writing—original draft, and writing—review and editing. Both authors contributed to the article and approved the submitted version.

This study was supported by the Fulbright Scholarship Program, Fulbright-Masaryk Scholarship no. 2020-28-11.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

  • ^ universaldependencies.org
  • ^ unimorph.github.io

Abusa’aleek, A. (2015). Internet linguistics: a linguistic analysis of electronic discourse as a new variety of language. Int. J. Engl. Linguist. 5. doi: 10.5539/ijel.v5n1p135

CrossRef Full Text | Google Scholar

Afshin, H., and Alaeddini, M. (2016). A Contrastive Analysis of Machine Translation (Google Translate) and Human Translation: efficacy in Translating Verb Tense from English to Persian. Mediterr. J. Soc. Sci. 7:40. doi: 10.5901/mjss.2016.v7n4S2p40

Agosti, A., and Rellini, A. (2007). The Italian LIWC Dictionary: Technical Report. Austin: LIWC.Net.

Google Scholar

Althoff, T., Clark, K., and Leskovec, J. (2016). Large-scale Analysis of Counseling Conversations: an Application of Natural Language Processing to Mental Health. Trans. Assoc. Comput. Linguist. 4, 463–476. doi: 10.1162/tacl_a_00111

Amini, H., Farahnak, F., and Kosseim, L. (2019). “Natural Language Processing: An Overview,” in Frontiers in Pattern Recognition and Artificial Intelligence , eds M. Blom, N. Nobile, and C. Y. Suen (Singapore: World Scientific), 35–55. doi: 10.1142/9789811203527_0003

Andrei, A. L. (2014). Development and evaluation of Tagalog linguistic inquiry and word count (LIWC) dictionaries for negative and positive emotion. Mclean: Mitre Corp Mclean.

Araújo, M., Pereira, A., and Benevenuto, F. (2020). A comparative study of machine translation for multilingual sentence-level sentiment analysis. Inf. Sci. 512, 1078–1102.

Asher, N., and van de Cruys, T. (2018). Content vs. function words: the view from distributional semantics. Proc. Sinn Und Bedeutung 22, 1–21.

Avolio, B. J., and Gardner, W. L. (2005). Authentic leadership development: getting to the root of positive forms of leadership. Leadersh. Q. 16, 315–338. doi: 10.1016/j.leaqua.2005.03.001

Baccianella, S., Esuli, A., and Sebastiani, F. (2010). “Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining,” Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10) , (France: European Language Resources Association (ELRA)), 2200–2204.

Baeza-Yates, R., and Liaghat, Z. (2017). “Quality-efficiency trade-offs in machine learning for text processing,” in 2017 IEEE International Conference on Big Data (Big Data) , (Boston: IEEE), 897–904.

Balage Filho, P., Pardo, T. A. S., and Aluísio, S. (2013). “An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis,” in Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology , (Porto Alegre: SBC).

Barrett, H. C. (2020). Towards a Cognitive Science of the Human: cross-Cultural Approaches and Their Urgency. Trends Cogn. Sci. 24, 620–638. doi: 10.1016/j.tics.2020.05.007

PubMed Abstract | CrossRef Full Text | Google Scholar

Basnight-Brown, D. M., and Altarriba, J. (2018). “The influence of emotion and culture on language representation and processing,” in Advances in culturally-aware intelligent systems and in cross-cultural psychological studies , ed. C. Faucher (Berlin: Springer), 415–432.

Bender, E. M. (2011). On achieving and evaluating language-independence in NLP. Linguist. Issues Lang. Technol. 6, 1–26.

Bermel, N. (2014). “Czech diglossia: Dismantling or dissolution?,” in Divided Languages? , eds J. Arokay, J. Gvozdanovic, and D. Miyajima (Berlin: Springer), 21–37.

Berry, D. S., Pennebaker, J. W., Mueller, J. S., and Hiller, W. S. (1997). Linguistic bases of social perception. Pers. Soc. Psychol. Bull. 23, 526–537.

Biber, D. (1991). Variation Across Speech and Writing. Cambridge: Cambridge University Press.

Biber, D. (2014). Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Lang. Contrast 14, 7–34.

Biber, D., and Conrad, S. (2019). Register, Genre, and Style. Cambridge: Cambridge University Press.

Bjekić, J., Lazareviæ, L. B., Živanoviæ, M., and Kneževiæ, G. (2014). Psychometric evaluation of the Serbian dictionary for automatic text analysis—LIWCser. Psihologija 47, 5–32. doi: 10.2298/psi1401005b

Boot, P. (2021). Machine-translated texts as an alternative to translated dictionaries for LIWC. Open Science Framework [Preprint]. doi: 10.31219/osf.io/tsc36

Boot, P., Zijlstra, H., and Geenen, R. (2017). The Dutch translation of the Linguistic Inquiry and Word Count (LIWC) 2007 dictionary. Dutch J. Appl. Linguist. 6, 65–76. doi: 10.1075/dujal.6.1.04boo

Bradley, M. M., and Lang, P. J. (1999). Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical report C-1. Gainesville: University of Florida, Center for research in psychophysiology.

Brewer, M. B., and Gardner, W. (1996). Who is this “We”? Levels of collective identity and self representations. J. Pers. Soc. Psychol. 71:83. doi: 10.1037/0022-3514.71.1.83

Carvalho, F., Rodrigues, R. G., Santos, G., Cruz, P., Ferrari, L., and Guedes, G. P. (2019). “Evaluating the Brazilian Portuguese version of the 2015 LIWC Lexicon with sentiment analysis in social networks,” in Anais Do VIII Brazilian Workshop on Social Network Analysis and Mining , (Porto Alegre: SBC), 24–34.

Castelvecchi, D. (2016). Can we open the black box of AI? Nat. News 538:20. doi: 10.1038/538020a

Chen, J., Qiu, L., and Ho, M.-H. R. (2020). A meta-analysis of linguistic markers of extraversion: Positive emotion and social process words. J. Res. Pers. 89:104035. doi: 10.1016/j.jrp.2020.104035

Chung, C. K., and Pennebaker, J. W. (2018). “What do we know when we LIWC a person? Text analysis as an assessment tool for traits, personal concerns and life stories,” in The SAGE Handbook of Personality and Individual Differences: The Science of Personality and Individual Differences , eds V. Zeigler-Hill and T. K. Shackelford (Thousand Oaks: Sage), 341–360.

Church, A. T., and Katigbak, M. S. (1989). Internal, external, and self-report structure of personality in a non-western culture: an investigation of cross-language and cross-cultural generalizability. J. Pers. Soc. Psychol. 57:857.

Corver, N., and van Riemsdijk, H. (2001). Semi-lexical categories: The function of content words and the content of function words. Berlin: Walter de Gruyter.

Cruse, D. A., Cruse, D. A., Cruse, D. A., and Cruse, D. A. (1986). Lexical Semantics. Cambridge: Cambridge University Press.

Cvrček, V., Laubeová, Z., Lukeš, D., Poukarová, P., Řehořková, A., and Zasina, A. J. (2020). Author and register as sources of variation: a corpus-based study using elicited texts. Int. J. Corpus Linguist. 25, 461–488.

Daems, J., Speelman, D., and Ruette, T. (2013). Register analysis in blogs: correlation between professional sector and functional dimensions. Leuven Work. Papers Linguist. 2, 1–27.

de Marneffe, M.-C., Manning, C. D., Nivre, J., and Zeman, D. (2021). Universal dependencies. Comput. Linguist. 47, 255–308.

Demjén, Z. (2014). Drowning in negativism, self-hate, doubt, madness: linguistic insights into Sylvia Plath’s experience of depression’. Commun. Med. 11, 41–54. doi: 10.1558/cam.v11i1.18478

Dino, A., Reysen, S., and Branscombe, N. R. (2009). Online Interactions Between Group Members Who Differ in Status. J. Lang. Soc. Psychol. 28, 85–93. doi: 10.1177/0261927X08325916

Dudãu, D. P., and Sava, F. A. (2020). The development and validation of the Romanian version of Linguistic Inquiry and Word Count 2015 (Ro-LIWC2015). Curr. Psychol. doi: 10.1007/s12144-020-00872-4

Dudãu, D. P., and Sava, F. A. (2021). Performing multilingual analysis with Linguistic Inquiry and Word Count 2015 (LIWC2015). An equivalence study of four languages. Front. Psychol. 12:570568. doi: 10.3389/fpsyg.2021.570568

Duff, A. S. (2000). Information Society Studies (Vol. 3). East Sussex: Psychology Press.

Ehrlinger, L., Haunschmid, V., Palazzini, D., and Lettner, C. (2019). “A DaQL to monitor data quality in machine learning applications,” in International Conference on Database and Expert Systems Applications , eds S. Hartmann, J. Küng, S. Chakravarthy, G. Anderst-Kotsis, A. Tjoa, and I. Khalil (Cham: Springer), 227–237.

Eichstaedt, J. C., Kern, M. L., Yaden, D. B., Schwartz, H. A., Giorgi, S., Park, G., et al. (2020). Closed and open vocabulary approaches to text analysis: a review, quantitative comparison, and recommendations. PsyArXiv [Preprint]. doi: 10.31234/osf.io/t52c6

Fuller, S. (2005). Another sense of the information age. Inf. Commun. Soc. 8, 459–463. doi: 10.1080/13691180500418246

Gandomi, A., and Haider, M. (2015). Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35, 137–144. doi: 10.1016/j.ijinfomgt.2014.10.007

Gao, R., Hao, B., Li, H., Gao, Y., and Zhu, T. (2013). “Developing simplified Chinese psychological linguistic analysis dictionary for microblog,” in International Conference on Brain and Health Informatics , (Berlin: Springer International Publishing), 359–368. doi: 10.1007/978-3-319-02753-1_36

Garimella, A., Mihalcea, R., and Pennebaker, J. (2016). “). Identifying Cross-Cultural Differences in Word Usage,” in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers , (Japan: The COLING 2016 Organizing Committee), 674–683. https://www.aclweb.org/anthology/C16-1065

Garten, J., Hoover, J., Johnson, K. M., Boghrati, R., Iskiwitch, C., and Dehghani, M. (2018). Dictionaries and distributions: combining expert knowledge and large scale textual data content analysis. Behav. Res. Methods 50, 344–361. doi: 10.3758/s13428-017-0875-9

Gill, A. J., Nowson, S., and Oberlander, J. (2009). “). What are they blogging about? Personality, topic and motivation in blogs,” in Third International AAAI Conference on Weblogs and Social Media , eds E. Adar, M. Hurst, T. Finin, N. S. Glance, N. Nicolov, and B. L. Tseng (California: The AAAI Press), 18–25.

Gill, A. J., and Oberlander, J. (2019). “Taking care of the linguistic features of extraversion,” in Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society , eds W. D. Gray and C. D. Schunn (Mahwah: Lawrence Erlbaum Associates), 363–368. doi: 10.4324/9781315782379-99

Goldberg, S. B., Flemotomos, N., Martinez, V. R., Tanana, M. J., Kuo, P. B., Pace, B. T., et al. (2020). Machine learning and natural language processing in psychotherapy research: alliance as example use case. J. Couns. Psychol. 67, 438–448. doi: 10.1037/cou0000382

Gottschalk, L. A. (2000). The Application of Computerized Content Analysis of Natural Language in Psychotherapy Research Now and in the Future. Am. J. Psychother. 54, 305–311. doi: 10.1176/appi.psychotherapy.2000.54.3.305

Gottschalk, L. A., Winget, C. N., and Gleser, G. C. (1969). Manual of Instructions for Using the Gottschalk-Gleser Content Analysis Scales: Anxiety, Hostility, and Social Alienation–personal Disorganization. California: University of California Press.

Gutiérrez-Artacho, J., Olvera-Lobo, M.-D., and Rivera-Trigueros, I. (2019). “Hybrid machine translation oriented to cross-language information retrieval: English-Spanish error analysis,” in World Conference on Information Systems and Technologies , eds Á Rocha, H. Adeli, L. Reis, and S. Costanzo (Cham: Springer), 185–194.

Haider, T., and Palmer, A. (2017). “Modeling communicative purpose with functional style: Corpus and features for German genre and register analysis,” in Proceedings of the Workshop on Stylistic Variation , (Stroudsburg: Association for Computational Linguistics), 74–84.

Harley, T. A. (2013). The Psychology of Language: From Data to Theory. East Sussex: Psychology press.

Hart, R. P. (2001). “Redeveloping DICTION: Theoretical considerations,” in Progress in Communication Sciences , ed. M. West (New York: Springer), 43–60.

Hart, R. P., and Carroll, C. (2011). DICTION: The text-analysis program. Thousand Oaks: Sage.

Haspelmath, M. (2020). The structural uniqueness of languages and the value of comparison for language description. Asian Lang. Linguist. 1, 346–366. doi: 10.3389/fneur.2019.01207

Hasselgård, H. (2013). “Crosslinguistic Differences in Grammar,” in The Encyclopedia of Applied Linguistics , ed. C. A. Chapplle (Hoboken: Blackwell Publishing Ltd). doi: 10.1002/9781405198431.wbeal0290

Hayeri, N. (2014). Does gender affect translation?: Analysis of English talks translated to Arabic. Ph.D. thesis. Austin: The University of Texas.

Hickey, R. (n.d.). English Linguistics. In English Linguistics in Essen. Duisburg: University of Duisburg and Essen. https://www.uni-due.de/ELE/

Hieber, D. W. (2020). “The languages and linguistics of indigenous North America: Word Classes,” in The languages and linguistics of indigenous North America: A comprehensive guide (The World of Linguistics 13) , eds C. Jany, K. Rice, and M. Mithun (Berlin: Mouton de Gruyter).

Hogenraad, R. (2018). Smoke and mirrors: Tracing ambiguity in texts. Digit. Scholarsh. Humanit. 33, 297–315. doi: 10.1093/llc/fqx044

Holtzman, N. S., Tackman, A. M., Carey, A. L., Brucks, M. S., Küfner, A. C., Deters, F. G., et al. (2019). Linguistic markers of grandiose narcissism: a LIWC analysis of 15 samples. J. Lang. Soc. Psychol. 38, 773–786.

Huang, C.-L., Chung, C. K., Hui, N., Lin, Y.-C., Seih, Y.-T., Lam, B. C., et al. (2012). The development of the Chinese linguistic inquiry and word count dictionary. Chin. J. Psychol . 54, 185–201. doi: 10.3389/fpsyg.2021.648677

Iliev, R., Dehghani, M., and Sagi, E. (2015). Automated text analysis in psychology: methods, applications, and future developments. Lang. Cogn. 7, 265–290. doi: 10.1186/s13063-015-0931-7

Impana, P., and Kallimani, J. S. (2017). “Cross-lingual sentiment analysis for Indian regional languages,” in 2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT) , (New Jersey: IEEE), 1–6.

Internet Users by Language (2021). Internet World Stats. Available online at: https://www.internetworldstats.com/stats7.htm (accessed September 24, 2021).

Ireland, M. E., and Pennebaker, J. W. (2010). Language style matching in writing: synchrony in essays, correspondence, and poetry. J. Pers. Soc. Psychol. 99:549. doi: 10.1037/a0020386

Jackson, J. C., Watts, J., Henry, T. R., List, J.-M., Forkel, R., Mucha, P. J., et al. (2019). Emotion semantics show both cultural variation and universal structure. Science 366, 1517–1522. doi: 10.1126/science.aaw8160

Johannßen, D., and Biemann, C. (2018). “Between the Lines: Machine Learning for Prediction of Psychological Traits - A Survey,” in Machine Learning and Knowledge Extraction , eds A. Holzinger, P. Kieseberg, A. M. Tjoa, and E. Weippl (Berlin: Springer International Publishing), 192–211. doi: 10.1007/978-3-319-99740-7_13

Johnson, A. (2009). The Rise of English: the Language of Globalization in China and the European Union. Macalester Int. 22:39. doi: 10.1089/omi.2017.0192

Kacewicz, E., Pennebaker, J. W., Davis, M., Jeon, M., and Graesser, A. C. (2014). Pronoun use reflects standings in social hierarchies. J. Lang. Soc. Psychol. 33, 125–143. doi: 10.1177/0261927x13502654

Kailer, A., and Chung, C. K. (2007). The Russian LIWC2007 dictionary. Austin: LIWC.Net.

Kennedy, B., Ashokkumar, A., Boyd, R. L., and Dehghani, M. (2021). Text analysis for psychology: methods, principles, and practices. PsyArXiv [Preprint]. doi: 10.31234/osf.io/h2b8t

Kim, U., Park, Y.-S., and Park, D. (2000). The challenge of cross-cultural psychology: the role of the indigenous psychologies. J. Cross Cult. Psychol. 31, 63–75.

Kirov, C., Cotterell, R., Sylak-Glassman, J., Walther, G., Vylomova, E., Xia, P., et al. (2020). UniMorph 2.0: universal Morphology. ArXiv [Preprint]. Available online at: http://arxiv.org/abs/1810.11101 (accessed September 24, 2021).

Koehn, P., and Knowles, R. (2017). “Six Challenges for Neural Machine Translation,” in Proceedings of the First Workshop on Neural Machine Translation , (Pennsylvania: Association for Computational Linguistics), 28–39. doi: 10.18653/v1/W17-3204

König, E., and van der Auwera, J. (eds) (2002). The Germanic Languages. Oxfordshire: Routledge.

Kornfilt, J. (2020). Parts of Speech, Lexical Categories, and Word Classes in Morphology. In Oxford Research Encyclopedia of Linguistics. Oxford: Oxford University Press. doi: 10.1093/acrefore/9780199384655.013.606

Kučera, D. (2020). Osobnostní markery v textu: Aplikace kvantitativní psychologicko-lingvistické analýzy písemného projevu při popisu osobnosti [Personality markers in text: Application of quantitative psychological-linguistic analysis of written text in personality description]. Czechia: Jihočeská univerzita v českých Budìjovicích.

Kučera, D., Haviger, J., and Havigerová, J. M. (2020). Personality and Text: quantitative Psycholinguistic Analysis of a Stylistically Differentiated Czech Text. Psychol. Stud. 65, 336–348. doi: 10.1007/s12646-020-00553-z

Kučera, D., Haviger, J., and Havigerová, J. M. (2021). Personality and Word Use: Study on Czech Language and the BigFive. Available online at: https://osf.io/vdb34 (accessed September 24, 2021).

Laajaj, R., Macours, K., Hernandez, D. A. P., Arias, O., Gosling, S. D., Potter, J., et al. (2019). Challenges to capture the big five personality traits in non-WEIRD populations. Sci. Adv. 5:eaaw5226. doi: 10.1126/sciadv.aaw5226

List of Countries Where English Is an Official Language – GLOBED (2019). Education Policies for Global Development. Available online at: http://www.globed.eu/wp-content/uploads/2019/11/English_official_language.pdf (accessed September 24, 2021).

Lyddy, F., Farina, F., Hanney, J., Farrell, L., Kelly, O., and Neill, N. (2014). An Analysis of Language in University Students’ Text Messages: language In University Students’ Text Messages. J. Comput. Mediat. Commun. 19, 546–561. doi: 10.1111/jcc4.12045

Magnini, B., Lavelli, A., and Magnolini, S. (2020). “Comparing Machine Learning and Deep Learning Approaches on NLP Tasks for the Italian Language,” in Proceedings of The 12th Language Resources and Evaluation Conference , (Marseille: European Language Resources Association), 2110–2119.

Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., and McClosky, D. (2014). “The Stanford CoreNLP natural language processing toolkit,” in Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations , (Pennsylvania: Association for Computational Linguistics), 55–60.

Martindale, C. (1973). An experimental simulation of literary change. J. Pers. Soc. Psychol. 25:319. doi: 10.1007/s10936-020-09741-4

Massó, G., Lambert, P., Penagos, C. R., and Saurí, R. (2013). “Generating new LIWC dictionaries by triangulation,” in Asia Information Retrieval Symposium , (Berlin: Springer), 263–271.

McAuliffe, W. H. B., Moshontz, H., McCauley, T. G., and McCullough, M. E. (2020). Searching for Prosociality in Qualitative Data: comparing Manual, Closed–Vocabulary, and Open–Vocabulary Methods. Eur. J. Pers. 34, 903–916. doi: 10.1002/per.2240

McCarthy, A. D., Kirov, C., Grella, M., Nidhi, A., Xia, P., Gorman, K., et al. (2020). “UniMorph 3.0: Universal Morphology,” in Proceedings of the 12th Language Resources and Evaluation Conference , (France: European Language Resources Association), 3922–3931.

Medvedeva, M., Haagsma, H., and Nissim, M. (2017). “An analysis of cross-genre and in-genre performance for author profiling in social media,” in International Conference of the Cross-Language Evaluation Forum for European Languages , (Cham: Springer), 211–223. doi: 10.1007/978-3-319-65813-1_21

Mehl, M. R. (2006). “Quantitative Text Analysis,” in Handbook of Multimethod Measurement in Psychology , eds M. Eid and E. Diener (Washington: American Psychological Association), 141–156.

Mehl, M. R., and Pennebaker, J. W. (2003). The sounds of social life: a psychometric analysis of students’ daily social environments and natural conversations. J. Pers. Soc. Psychol. 84:857. doi: 10.1037/0022-3514.84.4.857

Mehl, M. R., Robbins, M. L., and Holleran, S. E. (2012). How taking a word for a word can be problematic: context-dependent linguistic markers of extraversion and neuroticism. J. Methods Meas. Soc. Sci. 3, 30–50.

Meier, T., Boyd, R. L., Pennebaker, J. W., Mehl, M. R., Martin, M., Wolf, M., et al. (2019). “LIWC auf Deutsch”: the Development, Psychometrics, and Introduction of DE-LIWC2015. PsyArXiv [Preprint]. doi: 10.17605/OSF.IO/TFQZC

Meneghini, R., and Packer, A. L. (2007). Is there science beyond English?: initiatives to increase the quality and visibility of non−English publications might help to break down language barriers in scientific communication. EMBO Rep. 8, 112–116. doi: 10.1038/sj.embor.7400906

Mereu, L. (1999). Boundaries of Morphology and Syntax. Amsterdam: John Benjamins Publishing.

Mergenthaler, E., and Bucci, W. (1999). Linking verbal and non-verbal representations: computer analysis of referential activity. Br. J. Med. Psychol. 72, 339–354. doi: 10.1348/000711299160040

Milizia, P. (2020). “Morphology in Indo-European languages,” in Oxford Research Encyclopedia of Linguistics . Available online at: https://oxfordre.com/linguistics/view/10.1093/acrefore/9780199384655.001.0001/acrefore-9780199384655-e-634 (accessed June 30, 2020).

Modaresi, P., Liebeck, M., and Conrad, S. (2016). Exploring the Effects of Cross-Genre Machine Learning for Author Profiling in PAN 2016. Verona: CLEF. 970–977.

Mønsted, B., Mollgaard, A., and Mathiesen, J. (2018). Phone-based metric as a predictor for basic personality traits. J. Res. Pers. 74, 16–22. doi: 10.1016/j.jrp.2017.12.004

Newman, M. L., Groom, C. J., Handelman, L. D., and Pennebaker, J. W. (2008). Gender differences in language use: an analysis of 14,000 text samples. Discourse Process. 45, 211–236. doi: 10.1080/01638530802073712

Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., et al. (2016). “Universal Dependencies v1: A Multilingual Treebank Collection,” in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) , (France: European Language Resources Association (ELRA)), 1659–1666.

Oberlander, J., and Gill, A. J. (2006). Language with character: a stratified corpus comparison of individual differences in e-mail communication. Discourse Process. 42, 239–270.

Osborne, T., and Gerdes, K. (2019). The status of function words in dependency grammar: a critique of Universal Dependencies (UD). Glossa 4:17.

Ott, M., Auli, M., Grangier, D., and Ranzato, M. (2018). Analyzing uncertainty in neural machine translation. Int. Conf. Mach. Learn. 80, 3956–3965.

Pam, P. (2020). A stylistic investigation of selected internet discourses as tools for national development. Res. J. Mod. Lang. Lit. 1, 18–39.

Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., et al. (2015). Automatic personality assessment through social media language. J. Pers. Soc. Psychol. 108:934. doi: 10.1037/pspp0000020

Pennebaker, J., Chung, C., Frazee, J., Lavergne, G., and Beaver, D. (2014). When Small Words Foretell Academic Success: the Case of College Admissions Essays. PLoS One 9:e115844. doi: 10.1371/journal.pone.0115844

Pennebaker, J. W., Boyd, R. L., Jordan, K., and Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Austin: University of Texas at Austin.

Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., and Booth, R. J. (2007). The Development and Psychometric Properties of LIWC2007. Austin: The University of Texas at Austin.

Pennebaker, J. W., and King, L. A. (1999). Linguistic styles: language use as an individual difference. J. Pers. Soc. Psychol. 77:1296. doi: 10.1037//0022-3514.77.6.1296

Pennebaker, J. W., and Lay, T. C. (2002). Language use and personality during crises: analyses of Mayor Rudolph Giuliani’s press conferences. J. Res. Pers. 36, 271–282.

Pennebaker, J. W., Mehl, M. R., and Niederhoffer, K. G. (2003). Psychological Aspects of Natural Language Use: our Words, Our Selves. Annu. Rev. Psychol. 54, 547–577. doi: 10.1146/annurev.psych.54.101601.145041

Piolat, A., Booth, R. J., Chung, C. K., Davids, M., and Pennebaker, J. W. (2011). La version française du dictionnaire pour le LIWC: modalités de construction et exemples d’utilisation. Psychol. Française 56, 145–159. doi: 10.1016/j.psfr.2011.07.002

Pradhan, T., Bhansali, R., Chandnani, D., and Pangaonkar, A. (2020). “Analysis of Personality Traits using Natural Language Processing and Deep Learning,” in 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA) , (Piscataway: IEEE), 457–461. doi: 10.1109/ICIRCA48905.2020.9183090

Prates, M. O., Avelar, P. H., and Lamb, L. (2018). Assessing gender bias in machine translation–a case study with Google translate. ArXiv [Preprint]. Available online at: https://arxiv.org/abs/1809.02208 (accessed September 24, 2021).

Putri, G. D., and Havid, A. (2015). Types of errors found in Google Translation: a model of MT evaluation. Proc. ISELT FBS Univ. Negeri Padang 3, 183–188.

Qiu, L., Lin, H., Ramsay, J., and Yang, F. (2012). You are what you tweet: personality expression and perception on Twitter. J. Res. Pers. 46, 710–718. doi: 10.1016/j.jrp.2012.08.008

Ramírez-Esparza, N., Chung, C. K., Kacewicz, E., and Pennebaker, J. W. (2008). “The Psychology of Word Use in Depression Forums in English and in Spanish: Testing Two Text Analytic Approaches,” in Proceedings of the 2008 International Conference on Weblogs and Social Media , (California: association for the Advancement of Artificial Intelligence (AAAI)), 102–108.

Ramírez-Esparza, N., Pennebaker, J. W., García, F. A., and Suriá, R. (2007). La psicología del uso de las palabras: un programa de computadora que analiza textos en español. Rev. Mex. Psicol. 24, 85–99.

Rayson, P. (2009). Wmatrix: A web-based corpus processing environment. Lancaster: Lancaster University.

Riemer, N. (ed.) (2016). The Routledge Handbook of Semantics. Oxfordshire: Routledge.

Rijkhoff, J. (2011). When can a language have adjectives? An implicational universal. Berlin: De Gruyter Mouton.

Rusínová, Z. (2020). “Sufix (přípona),” in Nový encyklopedický slovník češtiny online. eds P. Karlík, M. Nekula, and J. Pleskalová (Brno: Masarykova univerzita).

Sánchez-Rada, J. F., and Iglesias, C. A. (2019). Social context in sentiment analysis: formal definition, overview of current trends and framework for comparison. Inf. Fusion 52, 344–356. doi: 10.1016/j.inffus.2019.05.003

Sardinha, T. B., and Pinto, M. V. (2019). Multi-Dimensional Analysis: Research Methods and Current Issues. London: Bloomsbury Publishing.

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., et al. (2013b). Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One 8:e73791. doi: 10.1371/journal.pone.0073791

Schwartz, H. A., Eichstaedt, J., Blanco, E., Dziurzynski, L., Kern, M. L., Ramones, S., et al. (2013a). “Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach,” in Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1. Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity , (Pennsylvania: Association for Computational Linguistics), 296–305.

Seidlhofer, B. (2011). Understanding English as a lingua franca. Oxford: Oxford University Press.

Seki, K. (2021). Cross-lingual text similarity exploiting neural machine translation models. J. Inf. Sci. 47, 404–418. doi: 10.1177/0165551520912676

Sharir, O., Peleg, B., and Shoham, Y. (2020). The cost of training nlp models: a concise overview. ArXiv [Preprint]. Available online at: https://arxiv.org/abs/2004.08900 (accessed September 24, 2021).

Shibata, D., Wakamiya, S., Kinoshita, A., and Aramaki, E. (2016). “Detecting Japanese patients with Alzheimer’s disease based on word category frequencies,” in Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP) , (Japan: The COLING 2016 Organizing Committee), 78–85.

Smith, J., Saint-Amand, H., Plamadã, M., Koehn, P., Callison-Burch, C., and Lopez, A. (2013). “Dirt cheap web-scale parallel text from the common crawl,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , (Stroudsburg: Association for Computational Linguistics), 1374–1383.

Søgaard, A., Ruder, S., and Vuliæ, I. (2018). On the limitations of unsupervised bilingual dictionary induction. ArXiv [Preprint]. Available online at: https://arxiv.org/abs/1805.03620 (accessed September 24, 2021).

Sonneveld, H. B., and Loening, K. L. (1993). Terminology: Applications in interdisciplinary communication. Amsterdam: John Benjamins Publishing.

Stachl, C., Pargent, F., Hilbert, S., Harari, G. M., Schoedel, R., Vaid, S., et al. (2020). Personality research and assessment in the era of machine learning. Eur. J. Pers. 34, 613–631. doi: 10.1002/per.2257

Stone, P. J., Bales, R. F., Namenwirth, J. Z., and Ogilvie, D. M. (1962). The general inquirer: a computer system for content analysis and retrieval based on the sentence as a unit of information. Behav. Sci. 7:484. doi: 10.1002/bs.3830070412

Straka, M., and Straková, J. (2017). “Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with udpipe,” in Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , (Stroudsburg: Association for Computational Linguistics), 88–99.

Stuart-Smith, J., and Timmins, C. (2010). “The role of the individual in language variation and change,” in Language and Identities , eds C. Lamas and D. Watt (Edinburgh: Edinburgh University Press), 39–54. doi: 10.3389/frai.2020.00046

Świątek, A. (2012). Pro-drop phenomenon across miscellaneous languages. Poland: Pedagogical University of Cracow.

Sylak-Glassman, J. (2016). The Composition and Use of the Universal Morphological Feature Schema (UniMorph Schema. Maryland: Center for Language and Speech Processing Johns Hopkins University.

Tausczik, Y. R., and Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29, 24–54.

Temizöz, Ö (2016). Postediting machine translation output: subject-matter experts versus professional translators. Perspectives 24, 646–665. doi: 10.1080/0907676X.2015.1119862

Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., and Kappas, A. (2010). Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci. Technol. 61, 2544–2558. doi: 10.1186/s12888-015-0659-7

Thomas, D. R., and Thomas, Y. L. (1994). “Same language, different culture: understanding inter-cultural communication difficulties among English speakers,” in Proceedings of the International English Language Education Conference: National and International Challenges and Responses (Kuala Lumpur: Language Centre, Universiti Kebangsaan Malaysia) 211–219.

Thompson, B., Roberts, S. G., and Lupyan, G. (2020). Cultural influences on word meanings revealed through large-scale semantic alignment. Nat. Hum. Behav. 4, 1029–1038. doi: 10.1038/s41562-020-0924-8

Thuy, N. T. T., Bach, N. X., and Phuong, T. M. (2018). “Cross-language aspect extraction for opinion mining,” in 2018 10th International Conference on Knowledge and Systems Engineering (KSE) , (Piscataway: IEEE), 67–72.

Universal Dependencies (2021). Universal Dependencies. Available online at: https://universaldependencies.org/ (accessed September 24, 2021).

Universal Dependencies: Syntax (2021). Syntax: General Principles. Available online at: https://universaldependencies.org/u/overview/syntax.html (accessed September 24, 2021).

Van Wissen, L., and Boot, P. (2017). “An electronic translation of the LIWC Dictionary into Dutch,” in Electronic Lexicography in the 21st Century: Proceedings of ELex 2017 Conference , (Leiden: Lexical Computing Ltd), 703–715.

Vanhove, M. (2008). From Polysemy to Semantic Change: Towards a typology of lexical semantic associations. Amsterdam: John Benjamins Publishing.

Vannest, J., Bertram, R., Järvikivi, J., and Niemi, J. (2002). Counterintuitive Cross-Linguistic Differences: more Morphological Computation in English Than in Finnish. J. Psycholinguist. Res. 31, 83–106. doi: 10.1023/A:1014934915952

Vivas, J., Kogan, B., Romanelli, S., Lizarralde, F., and Corda, L. (2020). A cross-linguistic comparison of Spanish and English semantic norms: looking at core features. Appl. Psycholinguist. 41, 285–297.

Wierzbicka, A. (2013). Imprisoned in English: The Hazards of English as a Default Language. Oxford: Oxford University Press.

Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., et al. (2005). “OpinionFinder: A system for subjectivity analysis,” in Proceedings of HLT/EMNLP 2005 Interactive Demonstrations , (Stroudsburg: Association for Computational Linguistics), 34–35.

Windsor, L. C., Cupit, J. G., and Windsor, A. J. (2019). Automated content analysis across six languages. PLoS One 14:e0224425. doi: 10.1371/journal.pone.0224425

Wolf, M., Horn, A. B., Mehl, M. R., Haug, S., Pennebaker, J. W., and Kordy, H. (2008). Computergestützte quantitative textanalyse: Äquivalenz und robustheit der deutschen version des linguistic inquiry and word count. Diagnostica 54, 85–98. doi: 10.1026/0012-1924.54.2.85

Wolfram, W., and Friday, W. C. (1997). The role of dialect differences in cross-cultural communication: proactive dialect awareness. Bull. Suisse de Linguistique Appl. 65, 143–154.

Yano, Y. (2006). Cross-cultural Communication and English as an international language. Intercult. Commun. Stud. 15:172.

Yarkoni, T. (2010). Personality in 100,000 Words: a large-scale analysis of personality and word use among bloggers. J. Res. Pers. 44, 363–373. doi: 10.1016/j.jrp.2010.04.001

Yarkoni, T., and Westfall, J. (2017). Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12, 1100–1122. doi: 10.1177/1745691617693393

Zednik, C. (2019). Solving the Black Box Problem: a Normative Framework for Explainable Artificial Intelligence. ArXiv [Preprint]. Available online at: http://arxiv.org/abs/1903.04361 (accessed September 24, 2021).

Zijlstra, H., Van Meerveld, T., Van Middendorp, H., Pennebaker, J. W., and Geenen, R. (2004). De Nederlandse versie van de ‘linguistic inquiry and word count’(LIWC). Gedrag Gezond 32, 271–281.

Keywords : natural language processing, cross-language, culture, closed-vocabulary approaches, LIWC

Citation: Kučera D and Mehl MR (2022) Beyond English: Considering Language and Culture in Psychological Text Analysis. Front. Psychol. 13:819543. doi: 10.3389/fpsyg.2022.819543

Received: 21 November 2021; Accepted: 14 February 2022; Published: 04 March 2022.

Reviewed by:

Copyright © 2022 Kučera and Mehl. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Dalibor Kučera, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

language and culture research paper

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Language and Culture

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Save to Library
  • Last »
  • Language and Identity Follow Following
  • Languages and Linguistics Follow Following
  • Sociolinguistics Follow Following
  • Language in Society Follow Following
  • Discourse Analysis Follow Following
  • Linguistic Anthropology Follow Following
  • Intercultural Communication Follow Following
  • Language and Ideology Follow Following
  • Anthropological Linguistics Follow Following
  • Linguistics Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Publishing
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

IMAGES

  1. (PDF) Exploring the relationship between language, culture and identity

    language and culture research paper

  2. (PDF) Language, Society and Culture

    language and culture research paper

  3. essay about culture

    language and culture research paper

  4. Language and Culture Essay Example

    language and culture research paper

  5. (PDF) Culture and its role in Teaching Foreign Languages

    language and culture research paper

  6. ⇉Language Culture and Society Essay Example

    language and culture research paper

VIDEO

  1. The relation between culture and language

  2. Language and Culture in Sociolinguistics detail in urdu hindi

  3. Culture and Language Expo

  4. Language, Culture & Identity

  5. Model Question Paper Class 9 Kannada Third language public/board assesment 2024 maulyankana karnatak

  6. Webinar: How Culture Impacts Language

COMMENTS

  1. (PDF) Language and Culture

    Language and Culture. Abstract. Language pervades social life. It is a primary means by which we gain access to the. contents of others' minds and establish shared understanding of the reality ...

  2. (PDF) EXPLORING THE IMPACT OF CULTURE ON LANGUAGE ...

    The research aimed at Developing the Language Repertoire of Non-Native Arabic Novice Learners by Using Web Based Semantic Fields in Light of the European Framework of Reference for Language ...

  3. PDF Linguistics across Cultures: The Impact of Culture on Second Language

    Culture has many different dimen-sions. It includes ideas, customs, skills, arts and tools that characterize a group of people in a given period of time; it is also the beliefs, values, and material objects that create our way of life. Culture establishes a context of cognitive and affective behavior for each per-son.

  4. The Psychology of Communication: The Interplay Between Language and

    Just as language shapes our thoughts and perceptions of the world, so too does one's culture. For the purpose of the current work, culture can be defined as the learned and shared systems of beliefs, values, preferences, and social norms that are spread by shared activities (Arshad & Chung, 2022; Bezin & Moizeau, 2017).Over the past 50 years, the Journal of Cross-Cultural Psychology (JCCP ...

  5. Learning Language, Learning Culture: Teaching Language to the Whole

    Language teachers have always known that learning an additional language requires learning about another culture. This is, in fact, one of the primary reasons for learning languages—to experience a different culture from the inside, so as to empathize with a broader range of others and to enrich one's ability to appreciate varied human experiences.

  6. Understanding the Language, the Culture, and the Experience

    To conduct cross-cultural, cross-language research is typically more expensive and more time-consuming than non-cross-cultural research, and may be one of the reasons why researchers exclude non-English speaking immigrants from their studies (Esposito, 2001).Non-English speaking immigrants who are experiencing a new culture, however, should not be excluded from research.

  7. Language and Culture

    Summary. Language is an arbitrary and conventional symbolic resource situated within a cultural system. While it marks speakers' different assumptions and worldviews, it also creates much tension in communication. Therefore, scholars have long sought to understand the role of language in human communication. Communication researchers, as well ...

  8. Introduction to Special Issue on Language and Culture

    Introduction to Special Issue on Language and Culture. Rafael Art. Javier, Language and culture have been found to be intimately and intricately interconnected (Wang 2017 ). It is through language that the expression of thoughts and perceptions are made known. With the explosion of new ways of communication, from instant messaging and texting ...

  9. International Journal of Language and Culture

    The aim of the International Journal of Language and Culture (IJoLC) is to disseminate cutting-edge research that explores the interrelationship between language and culture. The journal is multidisciplinary in scope and seeks to provide a forum for researchers interested in the interaction between language and culture across several disciplines, including linguistics, anthropology, applied ...

  10. Frontiers

    The paper discusses the role of language and culture in the context of quantitative text analysis in psychological research. It reviews current automatic text analysis methods and approaches from the perspective of the unique challenges that can arise when going beyond the default English language.

  11. Language and Culture Research in the Context of International Education

    This Research paper/Rapport de recherche is brought to you for free and open access by Scholarship@Western. It has been accepted for inclusion in ... Language and Culture Research in the Context of International Education and Second Language Acquisition. Sheri Zhang, University of Ottawa . Robert Anthony, University of Victoria ...

  12. Language and Culture

    This paper surveys the research methods and approaches used in the multidisciplinary field of applied language studies or language education over the last fourty years. Drawing on insights gained in psycho- and sociolinguistics, educational linguistics and linguistic anthropology with regard to language and culture, it is organized around five major questions that concern language educators.

  13. Language and Intercultural Communication

    1.1 Focus. Language and Intercultural Communication (LAIC) is an interdisciplinary journal which draws on several disciplines within the social and human sciences. These include modern languages, applied linguistics, education, anthropology, (social) psychology, sociology, religion, philosophy, cultural studies, media studies, drama and visual ...

  14. PDF Culture Learning in Language Education: A Review of the Literature

    The papers included: (1) a review of the literature on culture learning, (2) a theoretical work conceptualizing culture learning, and (3) an applied paper presenting the implications of theory and research for culture teaching. This is the first of the three papers.

  15. The concept of culture: Introduction to spotlight series on

    The papers encompass other issues as well (e.g., culture as dynamic and changing, culture as constructed by people, applied implications, methodological implications), and ultimately raise many further questions about culture and development that will hopefully inspire developmentalists to think deeply about the concept of culture and to ...

  16. The Relationship between Language and Culture

    Introduction The dialectical connection between language and culture has always been a concern of L2 teachers and educators. Whether culture of the target language is to be incorporated into L2 teaching has been a subject of rapid change throughout language teaching history. In the course of time, the pendulum of ELT practitioners' opinion has ...

  17. PDF The Relation between Language and Culture (Case Study Albanian ...

    72 The Relation between Language and Culture (Case Study Albanian Language) the world, if not every single language in the world ... The current paper employed two types of research methods such as quantitative and qualitative - otherwise known in research methodology as the mixed method research design. The quantitative method is employed so

  18. Language and Culture Research Papers

    This paper presents a brief background on the influence of culture on language, the benefits of studying L2 for cultural acquisition, the importance of recognizing different cultural motivations for L2 acquisition, intercultural differences that lead to misunderstandings and poor learning/teaching, the prevalence of ethnocentrism, and lastly ...

  19. Leveraging Corpus Metadata to Detect Template-based ...

    Wikipedia articles (content pages) are commonly used corpora in Natural Language Processing (NLP) research, especially in low-resource languages other than English. Yet, a few research studies have studied the three Arabic Wikipedia editions, Arabic Wikipedia (AR), Egyptian Arabic Wikipedia (ARZ), and Moroccan Arabic Wikipedia (ARY), and documented issues in the Egyptian Arabic Wikipedia ...