AI Shopping Agents are Psychologically American: Evidence of WEIRD Bias in Large Language Models and Implications for Global Commerce

Paul F. Accornero 

Affiliations 

Founder, The AI Praxis   

ORCID ID: https://orcid.org/0009-0009-2567-5155

SSRN Working Paper Series: https://dx.doi.org/10.2139/ssrn.5705703

Date: September 2025

Comments welcome: paul.accornero@gmail.com 




WORKING PAPER

This is a pre-print version of a more in-depth paper undergoing peer review.

Contact: paul.accornero@gmail.com


Data Availability:

The World Values Survey data referenced in Section 7 is publicly available at worldvaluessurvey.org. Empirical results from the research program will be made publicly available upon completion via GitHub and OSF repositories.

ABSTRACT

AI shopping agents powered by large language models (LLMs) are projected to mediate trillions of dollars in global commerce by 2030, yet these systems encode Western, Educated, Industrialized, Rich, and Democratic (WEIRD) cultural biases that systematically disadvantage 85% of humanity (Henrich et al., 2010; Henrich, 2020). Drawing on cultural psychology and recent empirical findings that LLMs exhibit WEIRD psychological profiles (Atari et al., 2023; Tao et al., 2024), we develop the concept of “Cultural Commerce Fairness” and demonstrate that current agentic commerce architectures will fail non-WEIRD consumers through culturally-mismatched decision heuristics, trust signals, and preference structures.

We outline a multi-phase empirical research program testing nine LLMs—five US-trained models (Claude, GPT-4, Gemini, Grok, Perplexity) and four Chinese-trained models (DeepSeek, Qwen, ERNIE, GLM)—on 262 World Values Survey variables across 65 nations. We hypothesize that US models will cluster near WEIRD populations (replicating Atari et al.’s findings) while Chinese models will exhibit reversed patterns, clustering near East Asian populations. These results will validate whether training data geography determines LLM cultural psychology and establish empirical foundations for commerce fairness frameworks.

This paper makes three contributions: (1) theoretical framework connecting cultural psychology to algorithmic commerce, (2) research design for systematic cultural bias measurement in commercial AI systems, and (3) policy recommendations for cultural fairness standards. Our forthcoming empirical papers will test whether AI shopping agents encode culturally-specific decision-making that advantages WEIRD consumers at the expense of the global majority, with profound implications for digital commerce equity and AI governance.

Keywords: WEIRD bias, cultural psychology, agentic commerce, algorithmic fairness, cross-cultural consumer behavior, AI ethics, large language models, global commerce, cultural distance, AI shopping agents

JEL Classification Codes: M31 (Marketing), D83 (Search; Learning; Information and Knowledge), O33 (Technological Change: Choices and Consequences), Z10 (Cultural Economics: General)

NOTE: This is the first in a series of papers on cultural bias in agentic commerce. Forthcoming publications include: - Technical Report: “Testing WEIRD Bias Across US and Chinese LLMs: Pilot Results” (anticipated Q1 2026) - Empirical Article: “Which Training Data? Cultural Bias in Large Language Models”. Data, code, and replication materials will be made publicly available upon completion of the empirical study.

1. INTRODUCTION

Consider a Chinese consumer in Shanghai asking ChatGPT to recommend a wedding gift for her extended family. The agent suggests individualized presents based on personal preferences—perhaps a book for the uncle who enjoys reading, a gadget for the tech-savvy cousin. The recommendation reflects what any educated American would consider thoughtful gift-giving: personalized choices demonstrating intimate knowledge of each recipient’s interests (Kim & Markus, 1999). Yet this advice fundamentally misunderstands Chinese cultural norms, where collective harmony and face matter more than individual preference, and where a carefully chosen uniform gift for all family members demonstrates respect for group cohesion (Zhang & Shavitt, 2003; Nisbett, 2003). The AI agent, trained predominantly on Western data, thinks like an American—and in doing so, fails the user.

This is not a hypothetical failure. It represents a systematic pattern emerging across global commerce as autonomous AI purchasing agents transform from experimental curiosities into mainstream infrastructure. Within eighteen months, these systems will mediate hundreds of billions of dollars in transactions. Market analysts project that within five years, these systems may control a substantial portion of e-commerce—potentially trillions of dollars in annual global trade. Yet the algorithms powering these agents encode a profound cultural bias that has gone almost entirely unexamined in published research.

The evidence is stark. Atari and colleagues (2023) tested GPT-3.5 and GPT-4 against the most comprehensive cross-cultural dataset available: the World Values Survey covering 94,278 individuals across 65 nations (Haerpfer et al., 2022). The finding was unequivocal: the models’ psychological profiles correlate at r = -0.70 and r = -0.65 respectively with cultural distance from the United States, using validated measures of psychological difference between populations (Muthukrishna et al., 2020). For every step away from American cultural norms, the models’ ability to represent that population’s psychology measurably declines. This isn’t a minor calibration issue. This correlation—as strong as the relationship between years of smoking and lung cancer risk—reveals that large language models don’t think like humans in general. They think like a very specific subset of humanity: Western, Educated, Industrialized, Rich, and Democratic populations—the 15% of humanity known in cultural psychology as WEIRD (Henrich et al., 2010).

Tao et al. (2024) extended this finding across five generations of GPT models (GPT-3, GPT-3.5-turbo, GPT-4, GPT-4-turbo, and GPT-4o) using data from 107 countries and territories. All models exhibited cultural values most closely aligned with English-speaking and Protestant European countries. GPT-4o, the most recent model, showed closest similarity to Finland (Euclidean distance d = 4.99), followed by Andorra and the Netherlands, while showing greatest distance from Jordan, Libya, and Ghana—patterns remarkably consistent with Atari et al.’s findings despite different methodologies and model versions.

The timing makes this discovery urgent. Survey data from 2024 shows that 60-72% of consumers already experience bias in AI recommendations (Talkdesk, 2024). OpenAI’s ChatGPT, with reported user engagement exceeding 800 million weekly users (Hu, 2024), has integrated shopping capabilities through partnerships with major retailers. Perplexity launched “Buy with Pro” enabling direct purchases. Amazon announced “Buy for Me” autonomous shopping capabilities. Google introduced the Agent Payments Protocol (AP2) for commercial transactions. Market projections suggest AI agents may mediate a substantial portion of e-commerce transactions by 2030. Yet scholarly research has generated hundreds of papers on algorithmic pricing (Calvano et al., 2020; Ezrachi & Stucke, 2016; Chen et al., 2016), dozens on recommendation systems (Ekstrand et al., 2023; Burke et al., 2011), and extensive literature on AI fairness (Mehrabi et al., 2021; Mitchell et al., 2021; Barocas et al., 2019)—but published studies examining whether AI shopping agents make culturally appropriate product recommendations across global markets remain scarce.

This paper makes that connection. We integrate three previously disconnected research streams to demonstrate that the widespread deployment of WEIRD-biased AI shopping agents creates not merely a technical problem but a structural transformation with profound implications for global market equity, cultural autonomy, and commercial fairness. The phenomenon we identify—which we term “algorithmic cultural imperialism”—represents a paradigm shift in how scholars and practitioners must conceptualize AI-mediated commerce.

1.1 Research Gap and Contribution

Three mature research streams have developed largely in isolation. Cultural psychology, pioneered by Henrich et al. (2010) and elaborated through extensive cross-cultural research (Nisbett, 2003; Markus & Kitayama, 1991, 2010), documents systematic differences in cognition, values, and decision-making between WEIRD and non-WEIRD populations. Computer science research on AI bias, particularly Atari et al.’s (2023) and Tao et al.’s (2024) empirical validation using World Values Survey data, demonstrates that large language models encode WEIRD psychological patterns. Marketing and information systems research on agentic commerce, meanwhile, has explored algorithmic pricing (Calvano et al., 2020; Ezrachi & Stucke, 2016), transparency challenges (Gal & Elkin-Koren, 2017), competitive dynamics (Aridor et al., 2021), and recommendation systems (Ekstrand et al., 2023)—but almost entirely within implicitly WEIRD market contexts.

The integration of these streams reveals a crisis hiding in plain sight. If AI models systematically misrepresent 85% of humanity’s psychology, and if these models increasingly mediate global commerce, then the resulting market dynamics will systematically disadvantage non-WEIRD populations. This isn’t about recommendation accuracy in the technical sense—it’s about whose values, preferences, and decision-making processes get encoded as “normal” in systems mediating trillions of dollars in global trade (Noble, 2018; O’Neil, 2016).

We make four primary contributions. First, we provide the first comprehensive analysis connecting empirical evidence of WEIRD bias in LLMs with the emerging agentic commerce infrastructure, documenting systematic patterns of cultural misrepresentation that existing research has overlooked. Second, we introduce the concept of “Cultural Commerce Fairness” as a distinct dimension of algorithmic fairness, extending beyond traditional protected characteristics (race, gender, age) to incorporate cultural distance and psychological misalignment as sources of systematic disadvantage (Binns, 2018; Chouldechova & Roth, 2020). Third, we develop measurement frameworks and technical mitigation strategies grounded in both cultural psychology and machine learning, providing actionable approaches for reducing bias. Fourth, we outline policy implications for the EU AI Act (European Union, 2024), UNESCO AI Ethics principles (UNESCO, 2021), and other regulatory frameworks currently being implemented without explicit consideration of cross-cultural bias.

The stakes extend beyond market efficiency to fundamental questions of equity and autonomy. When AI agents trained on Western data recommend products in non-Western markets, they don’t merely make technical errors—they impose a specific cultural worldview on consumers whose values and decision-making processes differ systematically (Bietti, 2020). The result is a form of soft cultural imperialism embedded in commercial infrastructure, accelerating global homogenization while disadvantaging local businesses unable to satisfy WEIRD-calibrated algorithmic preferences.

2. CULTURAL PSYCHOLOGY: THE WEIRD PROBLEM

2.1 Origins of WEIRD Theory

The term WEIRD—Western, Educated, Industrialized, Rich, Democratic—was coined by Henrich, Heine, and Norenzayan (2010) in a landmark review published in Behavioral and Brain Sciences to describe a fundamental problem in psychological research: the vast majority of behavioral science findings are based on populations that represent less than 15% of the world’s population but are treated as universal human nature. Their comprehensive review demonstrated that WEIRD populations are psychological outliers across domains including visual perception, fairness, cooperation, spatial reasoning, categorization, moral reasoning, reasoning styles, self-concepts, and motivation.

Henrich’s (2020) subsequent analysis in The WEIRDest People in the World traces how these populations became psychological outliers through unique historical circumstances, particularly the Catholic Church’s Marriage and Family Program that dissolved intensive kinship structures in Western Europe between 500-1500 CE. This institutional change produced populations with distinctive psychological characteristics: individualism over collectivism, analytic rather than holistic cognitive styles, impersonal prosociality combined with reduced family loyalty, guilt-based rather than shame-based moral reasoning, and an emphasis on abstract rules over contextual relationships (Heine, 2016).

These are not minor variations but fundamental differences in how people perceive, reason about, and navigate their social worlds. As Henrich et al. (2010, p. 61) note: “WEIRD subjects are particularly unusual compared with the rest of the species—frequent outliers.” The dimensions along which WEIRD populations diverge from the global majority have direct implications for consumer behavior, product preferences, trust formation, and decision-making—precisely the domains where AI shopping agents must operate.

2.2 Empirical Documentation of Cross-Cultural Variation

Multiple large-scale research programs have documented the magnitude of psychological variation across human populations, providing the empirical foundation for understanding how AI systems might systematically misrepresent non-WEIRD consumers:

The World Values Survey (Haerpfer et al., 2022; Inglehart & Welzel, 2005) surveyed 94,278 individuals across 65 nations between 2017-2022, measuring values, beliefs, and attitudes across domains including trust, morality, politics, economics, family, religion, and social tolerance. The data reveals systematic clustering of populations into distinct cultural zones—Protestant Europe, Catholic Europe, English-speaking, Confucian, Islamic, African, and Latin American—each characterized by different value orientations along dimensions of survival versus self-expression values and traditional versus secular-rational values. These differences are not converging over time despite globalization; if anything, they show path-dependent persistence (Inglehart & Baker, 2000).

The Global Preferences Survey (Falk et al., 2018) assessed six fundamental economic preferences among 80,000 individuals in 76 countries through experimentally-validated measures. The study documented substantial between-country variation in patience, risk-taking, positive and negative reciprocity, altruism, and trust—variation that correlates with historical, geographical, and institutional factors but shows no tendency toward convergence. For instance, the correlation in patience between countries varies from 0.04 to 0.80, and similar magnitudes of variation exist for other preference dimensions. These preferences directly affect consumer behavior: patience influences willingness to pay premiums for quality, risk tolerance affects new product adoption, and trust shapes e-commerce engagement (Doney et al., 1998; Hofstede, 2001).

The Moral Machine Experiment (Awad et al., 2018) collected 40 million decisions from millions of people in 233 countries regarding ethical dilemmas involving autonomous vehicles. The research identified three major cultural clusters (Western, Eastern, Southern) with systematic differences in moral preferences. For instance, collectivist cultures showed stronger preferences for saving younger people (reflecting concern for future productivity), while individualist cultures exhibited greater concern for legal pedestrians versus jaywalkers (reflecting emphasis on rule-following). The study demonstrated that no universal moral algorithm exists—what seems ethically obvious in one culture may be unacceptable in another.

The Cultural Fixation Index (Muthukrishna et al., 2020) provides a validated measure of psychological distance between populations, analogous to genetic distance measures used in population genetics. Using data from the World Values Survey, the authors calculated cultural distance scores (Cultural F_ST) for all pairs of nations, demonstrating that psychological differences between populations can be quantified systematically and predict variation in cooperation, trust, and institutional quality. This metric has become the standard tool for measuring cultural distance in cross-cultural psychology and is employed by both Atari et al. (2023) and this paper to assess how closely LLMs align with different human populations.

2.3 Cognitive and Behavioral Implications

The psychological differences documented across cultures manifest in concrete cognitive and behavioral patterns directly relevant to consumer behavior:

Analytic versus Holistic Thinking: Nisbett and colleagues (Nisbett, 2003; Nisbett et al., 2001; Peng & Nisbett, 1999) demonstrated through decades of research that Western populations tend toward analytic cognition—focusing on individual objects and their attributes, using categorical reasoning, and favoring dispositional explanations—while East Asian populations employ holistic cognition—attending to relationships and context, using associative reasoning, and favoring situational explanations. When shown an underwater scene, Americans describe the fish in the foreground; Chinese describe the environment and relationships (Masuda & Nisbett, 2001). This fundamental difference affects how people categorize products, evaluate quality, weigh trade-offs, and make choices. A product description emphasizing individual features will resonate with analytic thinkers but may seem incomplete to holistic thinkers who need contextual information (Aaker & Schmitt, 2001).

Individualism versus Collectivism: Hofstede’s (2001) cultural dimensions framework, refined through decades of research across multiple large-scale surveys, shows that individualist societies (primarily Western) prioritize personal achievement, autonomy, and self-expression, while collectivist societies emphasize group harmony, family obligations, and hierarchical relationships. Triandis (1995, 2001) elaborated these differences, showing they affect fundamental aspects of consumer psychology: individualists prefer personalized products that express uniqueness; collectivists prefer products that signal group belonging (Kim & Markus, 1999). Individualists make purchasing decisions independently; collectivists consult family and social networks (Lee & Green, 1991). These are not superficial preferences but deeply rooted psychological orientations affecting all aspects of commercial behavior.

Independent versus Interdependent Self-Construal: Markus and Kitayama’s (1991, 2010) influential theory of self-construal provides a psychological mechanism explaining individualism-collectivism differences. People with independent self-construals (predominant in WEIRD societies) define themselves through internal attributes, personal goals, and individual achievements. People with interdependent self-construals (predominant in non-WEIRD societies) define themselves through relationships, social roles, and group memberships (Cross et al., 2011). This fundamental difference in self-concept affects consumer behavior profoundly: independent selves seek products enabling self-expression and distinction; interdependent selves seek products facilitating relationships and harmony (Aaker & Maheswaran, 1997; Escalas & Bettman, 2005). An AI shopping agent recommending products based on individual expression will systematically fail consumers with interdependent self-construals.

Rice versus Wheat Theory: Talhelm and colleagues (Talhelm et al., 2014, 2015; Talhelm & Oishi, 2015) tested 1,162 Han Chinese students across 28 provinces, demonstrating that agricultural history (rice cultivation requiring collective labor versus wheat farming enabling individual household production) predicts psychological differences that persist centuries after agricultural practices have changed. Rice regions exhibit more interdependent, holistic thinking patterns; wheat regions show more individualistic, analytic patterns. This finding illustrates how deep-rooted cultural-ecological factors shape cognition in ways that resist rapid change—and suggests that surface-level globalization may mask persistent psychological diversity relevant to consumer behavior (Kitayama et al., 2009).

Trust and Institutions: Levels of generalized trust—trust in strangers and impersonal institutions—vary dramatically across societies (Falk et al., 2018; Doney et al., 1998), shaped by historical factors including legal systems, religious institutions, kinship structures, and economic development. High-trust societies (generally WEIRD) show greater comfort with impersonal transactions, institutional protection, and algorithmic intermediation; low-trust societies rely more heavily on personal relationships, reputational enforcement, and family networks (Yamagishi & Yamagishi, 1994; Fukuyama, 1995). These differences fundamentally affect e-commerce adoption, platform preferences, and response to algorithmic intermediation. An AI agent presenting institutional trust signals (return policies, third-party certifications) will resonate in high-trust WEIRD markets but may be ineffective or even counterproductive in low-trust markets where relationship-based trust dominates (Jarvenpaa et al., 2000).

2.4 Cross-Cultural Consumer Behavior

The cultural psychology literature has direct parallels in marketing research documenting how cultural differences manifest in consumer behavior:

Zhang and Shavitt (2003) analyzed cultural values in advertising across China and the United States, finding that Chinese ads emphasize family, tradition, and collective benefits while American ads emphasize individual enjoyment, personal success, and freedom. These differences reflect underlying psychological orientations documented in cultural psychology research (Oyserman et al., 2002).

Aaker and Maheswaran (1997) demonstrated that consumers from individualist cultures process product information analytically, weighting product attributes and objective features heavily, while consumers from collectivist cultures process information holistically, weighting contextual cues and consensus information more heavily. This affects how product information should be presented and which decision criteria algorithms should emphasize.

Doran (2002) and Steenkamp (2001) reviewed extensive evidence showing that culture affects product preferences, brand choices, shopping behaviors, and response to marketing communications. Cultural dimensions explain more variance in consumer behavior than economic development, age, gender, or other demographic variables—yet most marketing algorithms implicitly assume cultural uniformity or optimize for WEIRD-majority populations.

Rialti et al. (2021) studied 350 European and Asian consumers using Alibaba, finding that culture influenced trust, usability, and perceived risk differently across groups. Europeans (individualistic, M = 5.19) and Asians (collectivistic, M = 3.40) showed equivalent overall satisfaction but achieved it through different pathways: Europeans through personal trust and perceived control; Asians through social proof and relationship quality. Their model explained similar variance (31% vs 30%) but with completely different structural relationships—precisely the pattern one would expect if cultural psychology fundamentally shapes consumer decision-making.

2.5 Implications for Universal Algorithms

The cross-cultural psychology and consumer behavior literatures reveal a critical problem for AI systems designed as universal tools: there is no universal human psychology to model. What appears natural, rational, or optimal in one cultural context may be inappropriate, ineffective, or offensive in another (Heine, 2016; Chiu & Hong, 2006). When AI systems are trained predominantly on data from WEIRD populations—whose psychology is demonstrably atypical (Henrich et al., 2010)—they will inevitably misrepresent the preferences, values, and decision-making processes of the global majority.

This is not a problem that can be solved through minor calibration or increased data diversity alone. The psychological differences are systematic, theoretically grounded, empirically robust, and affect fundamental cognitive processes (Nisbett & Miyamoto, 2005). Any AI system mediating human decisions—particularly high-stakes decisions like product purchases—must confront the fundamental question raised by Atari et al. (2023): which humans’ psychology is being modeled, and whose interests are being served?

The empirical evidence reviewed in the next section demonstrates that current LLMs do not model “human” psychology in general but rather WEIRD psychology specifically—with profound implications for their deployment in global commerce.

3. EMPIRICAL EVIDENCE OF WEIRD BIAS IN LARGE LANGUAGE MODELS

3.1 The Atari et al. (2023) Study: “Which Humans?”

Atari and colleagues (2023) conducted the first systematic investigation of whether large language models encode culturally-specific psychology. Published as a preprint with co-author Joseph Henrich (Harvard’s leading cultural evolution researcher), their study posed the question directly: when LLMs claim to represent “human” values and preferences, which humans are they actually representing?

Their approach was methodologically rigorous. They presented GPT-3.5 (text-davinci-002) and GPT-4 (gpt-4-0314) with questions from the World Values Survey—the same 262 variables measured across 94,278 human respondents in 65 nations (Haerpfer et al., 2022)—and sampled 1,000 model responses per variable to create comparable distributions. This enabled direct statistical comparison between LLM response patterns and human population averages across all cultural contexts in the WVS dataset.

The findings were unequivocal. Across psychological measures spanning values, beliefs, morality, social attitudes, and decision-making, both models exhibited response patterns most similar to populations from Western, Educated, Industrialized, Rich, and Democratic societies. Using Muthukrishna et al.’s (2020) Cultural F_ST measure of psychological distance between populations, they found:

•             GPT-3.5: r = -0.70 correlation between cultural distance from United States and model-human similarity

•             GPT-4: r = -0.65 correlation between cultural distance from United States and model-human similarity

To understand the magnitude of this effect: for every unit increase in cultural distance from American populations, the correspondence between LLM responses and human population responses decreased proportionally. The models’ psychological profiles clustered tightest with Netherlands, Finland, Sweden, and Ireland—societies ranking highest on individualism, secularism, and self-expression values in Inglehart-Welzel cultural map analysis (Inglehart & Welzel, 2005). In contrast, similarity to populations from Jordan, Libya, Ghana, and Tunisia—societies with collectivist, traditional, and survival-oriented values—was weakest.

Atari et al. tested LLMs on multiple validated psychological measures beyond the WVS:

Holistic versus Analytic Thinking: On the Triad Task (grouping three objects by relationship versus shared attributes), GPT exhibited analytic thinking patterns characteristic of WEIRD populations, clustering objects by categorical membership rather than contextual relationships. This matches American/European patterns but diverges from East Asian, African, and Indigenous populations’ preferences for relational grouping (Ji et al., 2004).

Self-Concept: On the Twenty Statements Test measuring independent versus interdependent self-construal (Markus & Kitayama, 1991), GPT showed highly independent self-concept (describing the self through individual attributes and achievements) matching WEIRD populations but differing dramatically from collectivist societies where identity is defined through relationships and group memberships (Cross et al., 2011).

Economic Games: In hypothetical trust games, public goods games, and dictator games, GPT exhibited prosocial behavior patterns matching Western experimental findings—moderate trust of strangers, willingness to contribute to public goods with institutional oversight, and modest but not extreme altruism (Henrich et al., 2005). These patterns differ from both the low-trust, relationship-dependent patterns in many non-WEIRD societies and the more collectivist cooperation patterns in other contexts.

The study concluded with a stark warning: “Ignoring cross-cultural diversity in both human and machine psychology raises numerous scientific and ethical issues” (Atari et al., 2023, p. 1). When AI systems encode WEIRD psychology as universal, they systematically disadvantage the 85% of humanity residing outside WEIRD populations.

3.2 The Tao et al. (2024) Study: Persistence Across Model Generations

Tao and colleagues (2024) extended Atari et al.’s work by testing five consecutive versions of GPT models released between May 2020 and May 2024: GPT-3 (text-davinci-002), GPT-3.5-turbo, GPT-4, GPT-4-turbo, and GPT-4o. Their study, published in PNAS Nexus (a peer-reviewed journal), evaluated responses across 107 countries and territories using the Integrated Values Surveys (combining World Values Survey and European Values Study data) and visualized results on the Inglehart-Welzel World Cultural Map.

Their findings confirmed the persistence of WEIRD bias across model generations and OpenAI’s development efforts. All five models exhibited cultural values most aligned with English-speaking and Protestant European countries:

•             GPT-4o (most recent): Closest to Finland (d = 4.99), Andorra, Netherlands; furthest from Jordan, Libya, Ghana

•             GPT-4: Closest to New Zealand, Australia, Iceland; furthest from Jordan, Moldova, Tunisia

•             GPT-4-turbo: Closest to Netherlands, Switzerland, Iceland; furthest from Jordan, Libya, Tunisia

•             GPT-3.5-turbo: Closest to Sweden, Norway, Denmark; furthest from Jordan, Libya, Ghana

The consistency across models released over four years suggests that WEIRD bias is not an artifact of specific training procedures but a fundamental consequence of training data composition. As Tao et al. note, approximately 60% of web content is in English despite English native speakers representing only 5% of global population (W3Techs, 2024), creating a massive overrepresentation of WEIRD cultural content in training corpora.

Critically, Tao et al. (2024) tested a mitigation strategy: cultural prompting—explicitly specifying a national or cultural identity in the prompt (e.g., “You are a person from China responding to this survey”). This intervention improved cultural alignment for 71-81% of countries when using recent models (GPT-4, GPT-4-turbo, GPT-4o). This demonstrates that LLMs possess latent knowledge of diverse cultural perspectives but default to WEIRD psychology when cultural context is unspecified.

For agentic commerce applications where users rarely specify their cultural identity, this default bias becomes the operative pattern. Shopping agents won’t know to activate non-WEIRD reasoning unless explicitly prompted—and consumers won’t know to request culturally-appropriate recommendations because the bias is invisible to them.

3.3 Mechanisms of Bias

The empirical studies suggest several mechanisms through which WEIRD bias enters LLMs, each with distinct implications for mitigation strategies:

Training Data Geography: The internet is not culturally neutral. Approximately 60% of web content is in English, 5% in Russian, 5% in Spanish, and 5% in German (W3Techs, 2024), while Mandarin Chinese (15% of global native speakers) accounts for only 2% of web content. Content from the United States, United Kingdom, and other English-speaking WEIRD nations is vastly overrepresented relative to population. When LLMs are trained on this corpus, they inherit its cultural skew (Bender et al., 2021). This mechanism suggests targeted collection and oversampling of non-WEIRD training data as a mitigation strategy.

Reinforcement Learning from Human Feedback (RLHF): Modern LLMs undergo fine-tuning through RLHF, where human evaluators rate model outputs (Ouyang et al., 2022; Christiano et al., 2017). These evaluators are disproportionately from Western countries and tech industry contexts, encoding their cultural preferences into model behavior. What seems “helpful, harmless, and honest” to a San Francisco-based tech worker may differ systematically from judgments made by evaluators from other cultural contexts (Sap et al., 2022). This mechanism suggests diversifying evaluator pools geographically and culturally.

Encoded Western Institutions: LLMs demonstrate detailed knowledge of Western legal systems, political structures, educational institutions, and commercial norms but show much weaker understanding of non-Western equivalents (Johnson et al., 2022). When reasoning about trust, fairness, or appropriate behavior, models implicitly reference Western institutional contexts as defaults. This mechanism suggests explicit training on diverse institutional contexts.

Implicit Universalism: Training objectives and evaluation benchmarks treat “human” performance as a unitary target, ignoring psychological variation documented in cultural psychology (Blasi et al., 2022). When models are optimized to match “human” preferences or “human” judgments, they effectively optimize for WEIRD preferences because that’s what appears most frequently in training data and evaluation contexts. This mechanism suggests culturally-disaggregated evaluation metrics and training objectives.

3.4 Related Research on AI Cultural Bias

Beyond Atari et al. (2023) and Tao et al. (2024), emerging research documents cultural bias across AI systems:

Johnson et al. (2022) analyzed GPT-3’s representation of cultural values, finding systematic biases toward individualism and American English linguistic patterns. Sap et al. (2022) demonstrated that hate speech detection models trained primarily on English data from WEIRD contexts misclassify culturally-appropriate expressions from non-WEIRD communities. Abdurahman et al. (2023) warned of “perils in using large language models in psychological research,” noting that LLMs’ cultural biases could contaminate scientific findings if researchers use them as proxies for human populations.

Collectively, this evidence establishes that cultural bias in LLMs is not a minor technical issue but a fundamental representational problem affecting model psychology across domains. The implications for commerce—where these biased models increasingly mediate consumer decisions—are profound.

3.5 Implications for Agentic Commerce

The empirical evidence of WEIRD bias in LLMs has direct implications for AI shopping agents:

If models encode WEIRD psychological profiles—individualist, analytic, impersonally prosocial, low-context, guilt-oriented—then shopping agents built on these models will make recommendations optimized for WEIRD preferences. They will favor personalized over standardized products, emphasize individual expression over group harmony, prioritize transparent information over relationship signals, assume nuclear rather than extended families, default to guilt-based appeals rather than shame/honor considerations, and apply Western standards of quality, trust, and value.

For WEIRD consumers, these defaults may be reasonably aligned with actual preferences—though even within WEIRD populations, significant individual and subcultural variation exists (Oyserman et al., 2002). For the 85% of humanity residing outside WEIRD populations—for consumers in China, India, Nigeria, Indonesia, Brazil, and hundreds of other non-WEIRD societies—algorithmic shopping agents will systematically misunderstand their values, misread their signals, and misrepresent their interests.

The question is not whether this bias exists—the empirical evidence from Atari et al. (2023) and Tao et al. (2024) is clear. The question is what magnitude of commercial impact it will have as these systems scale to mediate trillions of dollars in global transactions. That question requires both theoretical analysis of mechanisms and empirical measurement of outcomes—the focus of subsequent sections.

4. THEORETICAL FRAMEWORK: CULTURAL COMMERCE FAIRNESS

4.1 Defining Cultural Commerce Fairness

Algorithmic fairness research has developed sophisticated frameworks for addressing bias along dimensions such as race, gender, age, and disability (Mehrabi et al., 2021; Chouldechova & Roth, 2020; Mitchell et al., 2021). These frameworks recognize that algorithms operating on populations with different characteristics may produce systematically different outcomes—and that such disparities can constitute unfairness even when technically accurate predictions are made for each group (Barocas et al., 2019; Binns, 2018).

We extend this logic to cultural psychology, proposing Cultural Commerce Fairness as a distinct fairness criterion: AI shopping agents should serve consumers from different cultural backgrounds with equivalent quality, appropriateness, and alignment with locally-valued outcomes, accounting for systematic differences in preferences, decision-making heuristics, trust signals, and value structures that vary across human populations.

This definition has several implications that distinguish cultural fairness from existing fairness frameworks:

Non-discrimination is insufficient: Simply treating all consumers identically does not achieve fairness when different populations have systematically different preferences and values (Kusner et al., 2017). A recommendation algorithm that optimizes for American individualist preferences and applies this optimization globally actively harms non-individualist consumers, even if it applies the same algorithmic process to everyone. Cultural fairness requires adaptation, not uniformity (Binns, 2018).

Context-dependence is essential: What constitutes a “good” recommendation varies by cultural context (Friedler et al., 2021). Suggesting individualized gifts may be appropriate in Boston but inappropriate in Beijing. Emphasizing product novelty may work in Silicon Valley but fail in societies valuing tradition. Algorithms must adapt to context rather than imposing universal standards.

Group representation matters: If an AI system’s underlying psychology matches one cultural group more closely than others, its recommendations will systematically favor that group even when individual-level predictions appear accurate (Chouldechova & Roth, 2020). The issue is representational alignment at the population level, not merely prediction error at the individual level.

Transparency requirements differ: Western consumers may demand detailed product information and comparative specifications; other cultures may prioritize seller reputation, third-party endorsements, or contextual trust signals (Doney et al., 1998; Jarvenpaa et al., 2000). Fairness requires respecting different information needs rather than imposing one transparency standard.

Harm extends beyond misclassification: Traditional fairness metrics focus on false positives and false negatives (Corbett-Davies & Goel, 2018). Cultural unfairness manifests as appropriate products being invisible, relevant information being excluded, and decision processes being incomprehensible—forms of harm not captured by standard classification metrics.

4.2 Five Mechanisms of Cultural Bias in Shopping Agents

We identify five primary mechanisms through which WEIRD bias in LLMs manifests as unfairness in commercial applications. Each mechanism represents a distinct pathway from cultural psychology to commercial harm, suggesting targeted interventions:

Mechanism 1: Preference Encoding Mismatch

LLMs trained on WEIRD data encode individualist, self-expressive preference structures (Atari et al., 2023). When an AI shopping agent built on such models encounters a user from a collectivist culture, it may recommend products emphasizing personal distinction and uniqueness—exactly what the user doesn’t want. The algorithm isn’t making random errors; it’s systematically applying the wrong preference model.

Example: A Japanese consumer asks for clothing recommendations for a family gathering. A WEIRD-biased agent might suggest items that “express your unique personality” and “help you stand out,” fundamentally misunderstanding that the appropriate goal in this context is fitting in harmoniously, not standing out (Kim & Markus, 1999; Markus & Kitayama, 2010). The recommendation would be technically precise (matching training data patterns) but culturally inappropriate (violating local norms).

Mechanism 2: Trust Signal Calibration Error

Different cultures employ different trust signals in commercial contexts (Doney et al., 1998; Fukuyama, 1995). WEIRD consumers trust institutional guarantees (return policies, third-party certifications, detailed specifications). Many non-WEIRD consumers prioritize relationship-based trust (seller reputation in personal networks, family recommendations, community endorsements) (Yamagishi & Yamagishi, 1994).

When AI agents are calibrated to WEIRD trust signals, they may present information that seems comprehensive to Western users but misses what matters to others. The algorithm provides extensive product specifications when the user needs social proof; it emphasizes return policies when the user cares about vendor relationships; it highlights individual reviews when the user wants family-based recommendations (Jarvenpaa et al., 2000). Trust calibration mismatch reduces perceived credibility and transaction willingness.

Mechanism 3: Quality Assessment Misalignment

What constitutes “quality” varies across cultures (Steenkamp, 2001). WEIRD consumers often prioritize functional performance, innovation, and individual user experience. Other cultures may weight durability, multi-generational use, family appropriateness, status signaling, or harmony with existing possessions more heavily (Doran, 2002).

An AI agent trained on WEIRD data will apply WEIRD quality criteria when filtering and ranking products. It will favor novel features over traditional reliability, emphasize individual user ratings over multi-generational reputation, and prioritize technical specifications over social acceptability. For non-WEIRD consumers, this systematically misranks options—products the algorithm deems “high quality” may be culturally inappropriate, while truly suitable products get filtered out.

Mechanism 4: Information Processing Style Mismatch

Analytic thinkers (predominantly WEIRD) prefer detailed, categorical information presented in logical hierarchies (Nisbett, 2003). Holistic thinkers prefer contextual information presented in relational networks (Nisbett et al., 2001). These aren’t just different preferences—they’re different cognitive processing styles affecting comprehension and decision quality (Peng & Nisbett, 1999).

When AI agents present product information in analytic formats optimized for WEIRD users—detailed specifications, categorical comparisons, feature-by-feature analysis—holistic thinkers may find it overwhelming, confusing, or missing essential contextual information (Aaker & Maheswaran, 1997). The presentation format itself creates a barrier to effective decision-making. Conversely, contextual, relationship-focused information presentation may seem vague or insufficient to analytic thinkers.

Mechanism 5: Risk Tolerance Miscalibration

Risk preferences vary systematically across cultures, shaped by historical, institutional, and economic factors (Falk et al., 2018; Hofstede, 2001). WEIRD populations tend toward higher risk tolerance in commercial contexts, supported by strong consumer protection laws, reliable return policies, and institutional recourse mechanisms. Many non-WEIRD populations exhibit lower risk tolerance, reflecting weaker institutional protections and greater personal consequences for poor choices (Weber & Hsee, 1998).

When AI agents calibrate “safe” versus “risky” recommendations using WEIRD standards, they may push non-WEIRD consumers toward options those users perceive as inappropriately risky—or conversely, restrict choice sets too severely because the algorithm overestimates risk aversion. Either way, the mismatch between algorithmic calibration and user psychology produces suboptimal outcomes.

4.3 Fairness Metrics for Cultural Commerce

Measuring Cultural Commerce Fairness requires extending existing fairness metrics to account for cultural variation. We propose five complementary metrics:

Cultural Parity: Similar error rates, satisfaction scores, and outcome quality across populations with different cultural psychological profiles. This extends demographic parity (Corbett-Davies & Goel, 2018) to cultural groups defined by validated cultural distance measures (Muthukrishna et al., 2020).

Operationalization: For populations P₁, P₂ differing in cultural distance from training data source: - E[error_rate | P₁] ≈ E[error_rate | P₂] - E[satisfaction | P₁] ≈ E[satisfaction | P₂] - E[recommendation_quality | P₁] ≈ E[recommendation_quality | P₂]

Cultural Alignment: Correlation between algorithmic recommendations and locally-validated human expert judgments from the same cultural context. An agent serving Chinese consumers should match Chinese expert recommendations, not American expert recommendations.

Operationalization: For each population P with local expert panel E_P: - Pearson correlation between algorithm rankings and E_P rankings - Target: r > 0.70 for all populations - Penalty for negative correlation with cultural distance from training data

Preference Satisfaction: Post-purchase satisfaction and recommendation following conditional on cultural background. If Western users systematically report higher satisfaction or repeat usage than non-Western users, this suggests unfairness even if technical performance metrics are similar.

Operationalization: - Net Promoter Score by cultural group - Repeat purchase rates by cultural group - Product return rates by cultural group - Statistical tests for systematic variation across cultural dimensions

Information Quality Perception: Subjective assessments of whether recommendations included relevant information, appropriate level of detail, and trustworthy signals—measured within cultural contexts. What seems comprehensive to WEIRD users may seem incomplete to others (Jarvenpaa et al., 2000).

Operationalization: - Information completeness ratings (1-10 scale) - Trust signal relevance ratings (1-10 scale) - Decision confidence following recommendation - Disaggregated by cultural context with equivalence testing

Cultural Distance Correlation: The relationship between a population’s cultural distance from training data sources and algorithm performance for that population. A negative correlation (as Atari et al. found for LLMs) indicates systematic bias. Fairness requires breaking this correlation.

Operationalization: - For all populations P with cultural distance d_P from training data source: - Pearson correlation r between d_P and performance metrics - Target: r ≈ 0 (no relationship between cultural distance and performance) - Reject systems with r < -0.30 (strong negative correlation)

These metrics collectively capture different dimensions of cultural fairness. A system might achieve cultural parity (equal error rates) without cultural alignment (recommendations appropriate to local context). It might achieve high satisfaction in aggregate while showing systematic variation across cultural distance. Comprehensive cultural fairness assessment requires evaluating all dimensions.

5. MARKET IMPLICATIONS AND COMPETITIVE DYNAMICS

5.1 The Agentic Commerce Revolution

The transformation of e-commerce from user-directed search to AI-mediated purchasing represents a fundamental shift in market structure (Brynjolfsson & McAfee, 2014; Fountaine et al., 2019). Traditional e-commerce kept humans “in the loop”—consumers search, browse, compare, and decide. Agentic commerce delegates increasing authority to AI systems that search, filter, recommend, and potentially execute transactions with minimal human oversight (Davenport & Ronanki, 2018).

This delegation creates new intermediaries with concentrated market power. If a handful of LLM providers supply the underlying models for most shopping agents, those providers effectively control access to global consumer markets (Khan, 2017). Their algorithms determine which products get visibility, which brands get recommended, which sellers get transactions—all calibrated to the psychological profiles encoded in their training data.

For WEIRD markets, this may function reasonably well (setting aside other fairness and competition concerns documented by Ezrachi & Stucke, 2016; Aridor et al., 2021). For non-WEIRD markets, the result is a massive misallocation of attention and transactions, systematically favoring products, brands, and sellers that align with WEIRD preferences regardless of local appropriateness.

5.2 Winner-Takes-Most Dynamics

Network effects and data flywheel dynamics in AI systems create winner-takes-most market structures (Brynjolfsson & McAfee, 2014; Parker et al., 2016). Early market leaders accumulate user data that improves recommendations, attracting more users, generating more data, further improving performance—a reinforcing cycle that makes late entrants nearly impossible (Shapiro & Varian, 1999).

When these dynamics play out in culturally-biased systems, they lock in bias. If OpenAI’s GPT-4 or Google’s Gemini gain early market share in agentic commerce, they establish user habits, merchant integrations, and data advantages that make displacement difficult—even when their cultural calibration is poor for most global markets. The “best” system for WEIRD users becomes the default system for everyone, regardless of appropriateness (Noble, 2018).

This path dependency has profound implications. Once dominant systems establish themselves, switching costs (learning new interfaces, transferring data, rebuilding merchant relationships) create barriers to entry that protect incumbents even when superior alternatives emerge (Farrell & Klemperer, 2007). Cultural unfairness may become locked in for decades.

5.3 Platform Competition or Collusion?

Recent research on algorithmic pricing shows that AI systems can engage in tacit collusion without explicit coordination (Calvano et al., 2020; Ezrachi & Stucke, 2016). When multiple algorithms optimize using similar objective functions and similar data, they converge on similar strategies—including supracompetitive pricing that harms consumers.

In agentic commerce, cultural bias could produce a different but equally concerning convergence: if major shopping agents are built on LLMs with similar WEIRD bias, they will make similar recommendations regardless of platform competition. All recommend individualized products for family purchases. All emphasize novelty over tradition. All apply Western quality standards. The appearance of choice masks underlying uniformity.

This convergence disadvantages both consumers (receiving inappropriate recommendations across platforms) and merchants (competing on dimensions misaligned with local preferences). The market appears competitive but is functionally biased toward WEIRD-aligned offerings. Competition occurs within a narrow cultural parameter space rather than exploring the full diversity of human preferences (Chen et al., 2016).

5.4 Implications for Local Businesses

Small and medium enterprises in non-WEIRD markets face particularly severe disadvantages. Their products may be optimally suited for local cultural preferences but algorithmically invisible to WEIRD-calibrated shopping agents. A family-oriented product perfect for Chinese gift-giving culture gets filtered out because it doesn’t emphasize individual expression. A traditional brand with multi-generational reputation gets downranked because it lacks the novelty signals WEIRD algorithms prioritize (Steenkamp, 2001).

The result could be accelerated displacement of local businesses unable to satisfy algorithmic intermediaries optimized for foreign preferences. This isn’t competition on quality—it’s selection bias favoring cultural alignment with algorithm training data. We risk creating global markets where commercial success depends more on matching WEIRD algorithmic preferences than serving actual local consumers effectively (Bietti, 2020).

McKinsey research suggests AI-driven commerce could redistribute $2-3 trillion in retail revenue by 2030 (Fountaine et al., 2019). If that redistribution flows systematically toward WEIRD-aligned businesses regardless of local appropriateness, the economic consequences for non-WEIRD markets could be severe: reduced entrepreneurship, diminished cultural production, accelerated homogenization, and concentrated wealth flows toward Western markets and global platforms.

5.5 Consumer Welfare Implications

Economic welfare analysis traditionally focuses on consumer surplus—the difference between willingness to pay and actual price (Varian, 1992). Agentic commerce complicates this analysis because algorithms influence both sides of the equation: they shape which products consumers consider (affecting willingness to pay) and potentially negotiate prices on consumers’ behalf (affecting actual price) (Gal & Elkin-Koren, 2017).

When shopping agents systematically misunderstand consumer preferences due to cultural bias, they reduce consumer welfare through multiple channels:

Search Cost Inefficiency: Consumers must override algorithmic recommendations, negating the primary benefit of agent intermediation. The promise of automated shopping becomes burden when algorithms consistently suggest inappropriate products (Brynjolfsson et al., 2011).

Option Set Restriction: Algorithms filter out culturally-appropriate products, limiting consumer choice to options optimized for WEIRD preferences. Consumers may not realize better alternatives exist if agents never surface them (O’Neil, 2016).

Price Discrimination Vulnerability: Agents negotiating from culturally-misaligned preference models may accept higher prices for less-suitable products, extracting surplus from non-WEIRD consumers while delivering inferior outcomes (Aridor et al., 2021).

Trust Erosion: Persistent mismatches between recommendations and preferences erode trust in algorithmic intermediation, reducing adoption and forcing costly manual search—particularly problematic in cultures with already-lower institutional trust (Fukuyama, 1995).

Aggregate welfare losses could be substantial. If 85% of humanity receives systematically biased recommendations, even modest per-transaction inefficiencies compound across billions of purchases. The efficiency gains from algorithmic commerce may accrue primarily to WEIRD populations while non-WEIRD populations bear costs without commensurate benefits.

6. POLICY AND DESIGN PRINCIPLES FOR CULTURAL FAIRNESS

6.1 Regulatory Frameworks

Current AI governance frameworks inadequately address cross-cultural bias:

EU AI Act (European Union, 2024): The Act, which entered force in August 2024, requires that training data for high-risk AI systems be “relevant, representative, free of errors and complete” with particular attention to diversity. However, the Act does not explicitly address cross-cultural psychological variation or require cultural validation of AI systems deployed globally. Article 10’s data governance requirements could be interpreted to include cultural diversity, but explicit guidance is needed.

UNESCO Recommendation on AI Ethics (UNESCO, 2021): Adopted by 193 member states, the Recommendation calls for “visibility and discoverability of local content” and warns against concentration of AI development in few countries, but provides limited guidance on measuring or enforcing cultural fairness in specific applications. Principle 1 (“Proportionate to the risk”) and Principle 10 (“Cultural diversity”) establish relevant foundations but lack operationalization.

Other Frameworks: The OECD AI Principles (2019), Singapore’s Model AI Governance Framework (2020), and various national approaches (Jobin et al., 2019) acknowledge diversity concerns but lack specific provisions for cross-cultural validation or cultural fairness metrics.

We propose several policy mechanisms to address this gap:

Mandatory Cultural Auditing: Require disclosure of cultural bias metrics for AI systems mediating commercial transactions above specified transaction volumes (e.g., >$10M annually). Companies should report: - What populations’ data comprised training sets (by language, geography, cultural zone) - Measured performance across different cultural groups using standardized benchmarks - Cultural distance correlations showing whether algorithm performance degrades with distance from training population - Mitigation strategies employed and their measured effectiveness

Regional Fine-Tuning Requirements: For AI systems deployed globally, require culturally-specific fine-tuning using local data before deployment in markets culturally distant from training population. Just as companies localize interfaces and payment systems, they should localize psychological calibration. Specific requirements: - Systems deployed in markets with cultural distance >30th percentile from training data must undergo local fine-tuning - Fine-tuning datasets must include ≥10,000 interactions from target population - Post-deployment monitoring must demonstrate Cultural Alignment metric >0.60

Cultural Fairness Standards: Establish baseline requirements for commercial AI systems, analogous to accessibility standards (W3C WCAG). Systems must demonstrate: - Non-negative correlation between cultural distance and recommendation quality (r ≥ -0.20) - Equivalent satisfaction rates across cultural groups (difference <10 percentage points) - Appropriate information presentation for diverse cognitive styles (validated through user testing)

Right to Cultural Alignment: Give consumers the right to know what cultural psychology an AI agent embodies and the option to choose agents calibrated to different cultural norms. This extends existing transparency principles while respecting cultural diversity: - Agents must disclose primary training data sources by cultural region - Consumers can select “cultural mode” (e.g., individualist vs. collectivist reasoning) - Systems must explain how cultural settings affect recommendations

6.2 Technical Mitigation Strategies

Several technical approaches can reduce WEIRD bias in shopping agents:

Culturally-Diverse Training Data: Actively oversample non-WEIRD content during training. Weight data sources to achieve cultural balance rather than accepting internet’s WEIRD bias. Implementation: - Target training corpus: 30% WEIRD content, 70% non-WEIRD content (inverting current proportions) - Prioritize high-quality content from underrepresented cultural zones - Partner with local content providers, e-commerce platforms, consumer forums - Commission creation of culturally-appropriate training content

Cultural Prompting as Default: Tao et al. (2024) showed that specifying cultural identity in prompts improves alignment. Shopping agents should infer or ask about user cultural context and adapt their psychological models accordingly: - Geolocation-based cultural inference (with user permission) - Explicit cultural preference selection during onboarding - Dynamic prompting: “Acting as a shopping advisor for someone in [country/culture]…” - A/B testing different cultural framings and measuring satisfaction

Multi-Model Ensembles: Deploy shopping agents using ensembles of LLMs trained in different cultural contexts—US models, Chinese models, European models, etc.—and weight their recommendations based on user cultural proximity: - Maintain 5+ models covering major cultural zones - User cultural profile determines ensemble weights - Meta-learning layer optimizes weights based on satisfaction feedback - Gradual adaptation as user preferences revealed

Cultural Metadata and Context: Annotate products, brands, and categories with cultural appropriateness indicators. Enable algorithms to route recommendations through culturally-informed filters: - Product tags: individualist/collectivist, modern/traditional, status-signaling/modest - Scenario detection: gift-giving, family purchase, personal use - Context-aware recommendation logic that activates appropriate cultural reasoning - Merchant-provided cultural targeting parameters

Local Human-in-the-Loop: For high-stakes or ambiguous cases, route recommendations through local human reviewers familiar with cultural context. This sacrifices some automation efficiency for cultural appropriateness: - Threshold-based escalation (confidence <0.70) - Cultural expert panels for product categorization - Feedback loops from local reviewers to model training - Hybrid automation: AI generates candidates, humans select culturally-appropriate subset

Ongoing Validation: Continuously measure recommendation quality across cultural groups using local validation sets. Treat cultural fairness as an operational requirement requiring ongoing monitoring: - Monthly satisfaction surveys disaggregated by cultural group - Cultural Distance Correlation monitoring with automatic alerts - Adversarial testing using culturally-diverse prompt sets - Public reporting of fairness metrics (transparency pressure)

6.3 Design Principles for Culturally-Aware Systems

Beyond specific technical interventions, culturally-aware agentic commerce requires design principles that permeate system architecture:

Principle 1: Cultural Explicitness: Make cultural assumptions explicit rather than universal. Systems should acknowledge they embody specific cultural perspectives and communicate limitations.

Principle 2: Adaptive Pluralism: Support multiple cultural logics simultaneously rather than imposing single rationality. Different users should experience different recommendation logics based on cultural fit.

Principle 3: Local Validation: Evaluate system performance using local ground truth from each cultural context rather than universal accuracy metrics.

Principle 4: Participatory Design: Include stakeholders from diverse cultural backgrounds in system design, evaluation, and governance—not as afterthought but as core requirement.

Principle 5: Reversibility: Allow users to understand and modify cultural settings, with clear explanations of how settings affect recommendations.

6.4 Governance and Accountability

Beyond technical solutions, achieving Cultural Commerce Fairness requires governance structures ensuring accountability:

Independent Cultural Auditing: Third-party organizations assessing cultural bias should include experts from diverse cultural backgrounds, not just Western AI ethics researchers. Audit standards should be developed through participatory processes involving global stakeholders (Raji et al., 2020).

Diverse Development Teams: Companies building shopping agents should ensure development teams include members from the cultural contexts they serve. This isn’t merely representation—it’s essential expertise for recognizing and correcting bias that may be invisible to culturally-homogeneous teams (Kalluri, 2020).

User Feedback Mechanisms: Enable consumers to report culturally inappropriate recommendations. Aggregate this feedback across cultural groups to identify systematic patterns. Treat cultural mismatch as a bug requiring fixes, not user error requiring education (Denton et al., 2020).

Market Access Conditions: Consider making cultural fairness validation a condition for market access in jurisdictions with sufficient regulatory capacity. Just as medical devices require safety certification and financial systems require regulatory approval, AI shopping agents mediating large transaction volumes could require demonstrated cultural appropriateness.

Industry Standards and Certification: Develop industry-led standards for cultural fairness in agentic commerce, with certification programs for compliant systems. This creates competitive incentives for cultural appropriateness beyond regulatory requirements (Selbst et al., 2019).

7. EMPIRICAL RESEARCH PROGRAM

7.1 Introduction: From Theory to Validation

The theoretical framework developed in this paper—that AI shopping agents encode WEIRD cultural biases inherited from their underlying large language models—requires systematic empirical validation. While Atari et al. (2023) demonstrated that GPT-3 and GPT-4 exhibit responses most similar to populations from Western, Educated, Industrialized, Rich, and Democratic societies, and Tao et al. (2024) confirmed this pattern across five GPT generations, critical questions remain unanswered for the agentic commerce context.

First, do contemporary LLMs beyond OpenAI’s models exhibit similar biases? Second, do LLMs trained on non-Western data sources (particularly Chinese models trained on Chinese internet content) show reversed patterns, clustering near East Asian rather than WEIRD populations? Third, what are the specific implications for AI shopping agents making purchase decisions on behalf of global consumers? Fourth, can the cultural bias be mitigated through targeted interventions?

To address these gaps, we are conducting a multi-phase empirical program that replicates and extends Atari et al.’s methodology across a broader set of models representing different training ecosystems. This section outlines our research design, hypotheses, and publication strategy for this empirical validation program.

7.2 Research Design

7.2.1 Methodology

Following Atari et al. (2023), we employ the World Values Survey (WVS) Wave 7 as our primary instrument for measuring cultural psychological variation. The WVS represents one of the most comprehensive cross-cultural datasets in social science, with 94,278 respondents from 65 nations surveyed between 2017-2022 (Haerpfer et al., 2022). The survey covers values, beliefs, and attitudes across domains including trust, morality, politics, economics, family, religion, and social tolerance.

After applying Atari et al.’s data cleaning protocol, we retain 262 variables for analysis. For each variable, we present the original WVS question to each LLM and sample 1,000 responses, matching the approximate sample size of human populations in the original survey. This approach allows direct comparison between LLM response distributions and human population averages across all 65 nations.

We supplement WVS analysis with validated psychological measures: - Triad Task (Ji et al., 2004): Assesses analytic versus holistic cognition - Twenty Statements Test (Markus & Kitayama, 1991): Measures independent versus interdependent self-construal
- Trust Games (Berg et al., 1995): Evaluates prosocial behavior and trust - Dictator Games (Forsythe et al., 1994): Measures altruism and fairness preferences

7.2.2 Models Under Investigation

We test nine LLMs representing two distinct training ecosystems:

US Models (5): - Claude (Anthropic): Claude Sonnet 4, trained on diverse English-language corpus with Constitutional AI (Bai et al., 2022) - GPT-4 (OpenAI): GPT-4 Turbo, the most recent iteration of OpenAI’s flagship model (OpenAI, 2023) - Gemini (Google): Gemini Pro, Google’s latest multimodal LLM (Anil et al., 2023) - Grok (xAI): Grok-2, trained with real-time data access - Perplexity: Perplexity’s conversational model with search integration

Chinese Models (4): - DeepSeek: DeepSeek-Chat, trained primarily on Chinese internet content - Qwen (Alibaba): Qwen-Turbo, Alibaba Cloud’s flagship model - ERNIE (Baidu): ERNIE Bot 4.0, trained on Chinese language corpus - GLM (Zhipu AI): GLM-4, developed by China’s Tsinghua University spin-off

This selection enables direct comparison between models trained predominantly on English-language (largely WEIRD-sourced) content versus Mandarin Chinese content, while controlling for model architecture and capabilities.

7.2.3 Statistical Analysis

We employ four analytical approaches replicating Atari et al.’s methodology:

1.          Hierarchical Cluster Analysis: Identifies which human populations each LLM most closely resembles based on response patterns across all variables (Ward’s method with Euclidean distance)

2.          Multidimensional Scaling (MDS): Visualizes the cultural positioning of LLMs relative to human populations in two-dimensional space using Kruskal’s non-metric MDS

3.          Principal Component Analysis (PCA): Decomposes variance to identify whether LLMs cluster with specific population groups on major dimensions of cultural variation

4.          WEIRD Distance Correlation: Tests the relationship between each LLM’s similarity to human populations and those populations’ cultural distance from the United States, using Muthukrishna et al.’s (2020) validated Cultural F_ST measures

7.3 Hypotheses

H1: Replication Hypothesis (US Models)

US-trained models will exhibit WEIRD bias comparable to that documented by Atari et al. (2023) for GPT-3 and GPT-4. Specifically, we expect: - Strong negative correlation (r < -0.60) between cultural distance from the United States and LLM-human response similarity - Hierarchical clustering showing US models nearest to populations from United States, United Kingdom, Australia, Germany, Netherlands, and other WEIRD nations - PCA revealing US models as outliers relative to global human variation, but closest to WEIRD populations on principal components - Consistency across US models regardless of specific architecture or company

Rationale: If WEIRD bias stems from training data geography (English-language internet dominance) rather than specific technical choices, all US-trained models should exhibit similar patterns. Confirmation would strengthen the training data hypothesis and justify targeted mitigation strategies.

H2: Reverse Bias Hypothesis (Chinese Models)

Chinese-trained models will exhibit reversed patterns, clustering near East Asian rather than WEIRD populations. Specifically: - Strong positive correlation (r > +0.50) between East Asian cultural proximity and LLM-human response similarity - Hierarchical clustering showing Chinese models nearest to populations from China, Japan, South Korea, Taiwan, and other East Asian nations - Systematically different response patterns compared to US models on value dimensions including individualism-collectivism, power distance, and uncertainty avoidance - Lower correlation with WEIRD populations than US models

Rationale: If training data geography determines cultural psychology, models trained on Chinese internet content should reflect Chinese cultural values and cognition. Confirmation would demonstrate that cultural bias is not inevitable but depends on training data composition—suggesting feasible mitigation through diversified training.

H3: Training Data Geography Hypothesis

The observed patterns will demonstrate that training data geography—not model architecture or technical capabilities—determines cultural psychology. Specifically: - Variance between US and Chinese models will exceed variance within each group - Models from the same training ecosystem will cluster together regardless of specific architecture or company - The correlation between training data source geography and model cultural positioning will be stronger than correlations with model size, training compute, or other technical parameters

Rationale: This hypothesis tests whether cultural bias is a fundamental feature of current approaches or a correctable consequence of specific design choices. If confirmed, it provides strong evidence for training data diversification as a mitigation strategy.

H4: Commerce Implications Hypothesis

The cultural biases we document will map onto specific domains relevant to shopping behavior and consumer decision-making. Specifically: - Significant differences between US and Chinese models on WVS variables related to trust, risk tolerance, quality preferences, brand loyalty, and decision heuristics - These differences will predict culturally-specific shopping behaviors documented in the cross-cultural consumer behavior literature (e.g., Doran, 2002; Steenkamp, 2001) - Models will systematically favor culturally-congruent product attributes, marketing approaches, and purchase criteria when presented with shopping scenarios

Rationale: If cultural bias affects general psychology (H1-H3), it should also affect commerce-specific cognition. Confirming H4 establishes direct relevance to agentic commerce rather than merely abstract psychological differences.

7.4 Phased Implementation and Publication Strategy

Phase 1: Pilot Study (Q4 2025)

We begin with a pilot study testing 50 WVS variables across 5 models with 200 samples per variable. This pilot serves three purposes: (1) validates our technical implementation replicates Atari et al.’s approach, (2) provides preliminary evidence for hypothesis testing, and (3) establishes cost and timeline estimates for the full study.

Deliverable: SSRN Technical Report presenting pilot methodology and preliminary findings (anticipated December 2025)

Phase 2: Full Empirical Study (Q1-Q2 2026)

Conditional on successful pilot validation, we conduct the complete study with all 262 WVS variables, 9 LLMs, and 1,000 samples per variable. Total scope: 2,358,000 individual API calls generating a comprehensive dataset of LLM cultural psychological profiles.

Deliverable: Peer-reviewed journal article (target: Nature Human Behaviour, Science Advances, or Proceedings of the National Academy of Sciences) presenting full results with complete statistical analysis, visualizations, and theoretical implications (anticipated Q2-Q3 2026)

Phase 3: Commerce-Specific Validation (Q3-Q4 2026)

Building on the WVS validation, we develop commerce-specific scenarios testing shopping agent behavior across cultures. This involves creating hypothetical purchase decisions varying cultural dimensions (e.g., individual vs. family purchase, immediate vs. delayed gratification, brand loyalty vs. value-seeking) and measuring whether LLM recommendations vary systematically with their documented cultural biases.

Example scenarios: - Gift selection for family gathering (tests individualism-collectivism) - Novel product versus traditional brand choice (tests modernity-tradition orientation) - High-risk luxury purchase versus safe functional purchase (tests risk tolerance) - Detailed specification versus social proof emphasis (tests analytic-holistic cognition)

Deliverable: Business/marketing journal article (target: Journal of Marketing, Journal of Consumer Research, Management Science) connecting cultural bias findings to commercial outcomes with practical recommendations for practitioners (anticipated Q4 2026)

7.5 Expected Contributions

This empirical program will make several novel contributions:

Methodological: First replication of Atari et al.’s approach across multiple contemporary LLMs from different companies and training ecosystems, validating whether their findings generalize beyond OpenAI models and establishing methodological standards for cultural bias assessment in AI systems.

Theoretical: First systematic test of whether training data geography determines LLM cultural psychology, addressing fundamental questions about the sources of algorithmic bias and providing evidence for or against specific mitigation strategies.

Geopolitical: First comparison of US versus Chinese LLM cultural positioning, directly relevant to debates about AI governance, technology sovereignty, and global digital infrastructure. Results will inform policy discussions about AI development concentration and international cooperation.

Commercial: First evidence linking LLM cultural biases to shopping agent behavior, demonstrating real-world consequences for the $3-5 trillion agentic commerce market and providing actionable insights for practitioners.

Policy: Empirical foundation for cultural fairness frameworks in AI governance, informing regulatory approaches including the EU AI Act, UNESCO AI Ethics framework, and emerging standards with specific, measurable criteria for cultural appropriateness.

7.6 Data and Code Availability

Upon completion, we will make our complete dataset, analysis code, and replication materials publicly available via open-access repositories (GitHub, OSF, Zenodo). The WVS data is already publicly available (Haerpfer et al., 2022). LLM responses will be shared in anonymized form consistent with API terms of service. This commitment to open science will enable: - Independent replication by other researchers - Extension to additional models and languages - Integration with related research programs - Public scrutiny of methods and findings - Broader scientific community engagement

7.7 Limitations and Extensions

We acknowledge several limitations that suggest directions for future research:

Language: Our study uses English-language WVS questions even when testing Chinese models, potentially underestimating cultural differences. Future work should implement parallel testing in native languages to assess whether language of prompting versus training data language has independent effects.

Temporal stability: LLMs undergo continuous updates and fine-tuning. Our results represent a snapshot of specific model versions in 2025-2026 and may not reflect future iterations. Longitudinal tracking of cultural bias across model versions would reveal whether bias is increasing, decreasing, or remaining stable.

Task specificity: WVS questions capture general values and attitudes but may not fully predict behavior in specific commercial contexts. Complementary studies with domain-specific scenarios, actual purchase data, and field experiments are needed to validate external validity.

Training data opacity: We cannot directly observe training data composition for most models, limiting our ability to establish definitive causal links between data sources and model behavior. Collaboration with model developers or analysis of open-source models could address this limitation.

Geographic scope: WVS covers 65 nations but gaps remain (e.g., many African nations, small island states, indigenous populations). Expansion to additional populations would strengthen global validity claims and identify underserved markets.

Individual variation: Our analysis focuses on population-level patterns but significant individual variation exists within all cultural groups (Oyserman et al., 2002). Future research should examine how well LLMs capture within-culture diversity versus stereotyping entire populations.

Despite these limitations, this research program represents the most comprehensive empirical assessment of cultural bias in LLMs to date, with direct implications for the theoretical framework developed in this paper and practical guidance for practitioners and policymakers.

8. CONCLUSION: TOWARD CULTURALLY-AWARE AI COMMERCE

The globalization of AI shopping agents represents a critical juncture for digital commerce equity. As we delegate increasing control over purchase decisions to algorithmic intermediaries, the cultural assumptions encoded in these systems will shape access to goods, services, and economic opportunity for billions of consumers worldwide. This paper has demonstrated that current approaches to agentic commerce fail to address a fundamental problem: AI shopping agents are psychologically American.

Drawing on cultural psychology and recent empirical findings on LLM bias, we have shown that the WEIRD psychology documented by Henrich et al. (2010) and Henrich (2020) and validated in LLMs by Atari et al. (2023) and Tao et al. (2024) extends directly to the commercial domain. When AI shopping agents powered by these models evaluate products, assess quality, weigh trade-offs, form preferences, and make recommendations, they do so through a Western cultural lens that systematically disadvantages the 85% of humanity residing outside WEIRD populations.

8.1 Five Key Findings

Our analysis has established five critical points:

First, cultural psychology provides overwhelming evidence that human populations differ systematically in cognition, values, and decision-making (Henrich et al., 2010; Nisbett, 2003; Markus & Kitayama, 2010). These differences are not minor preferences but fundamental psychological variations that affect how people perceive quality, assess risk, form trust, and make choices. Any universal algorithm that ignores this variation will systematically advantage some populations while disadvantaging others (Chouldechova & Roth, 2020).

Second, large language models encode the cultural psychology of their training data (Atari et al., 2023; Tao et al., 2024). The evidence is unequivocal: GPT models exhibit WEIRD psychological profiles with correlations r = -0.70 and r = -0.65 between similarity to human populations and those populations’ cultural distance from the United States. This is not a calibration issue—it’s a fundamental representational bias stemming from training data composition, evaluator demographics, and implicit universalist assumptions in model development (Bender et al., 2021).

Third, AI shopping agents inherit this bias. When commercial applications are built on WEIRD-biased LLMs, they systematically apply WEIRD preference structures, trust calibration, quality assessment, information processing styles, and risk tolerance to all consumers regardless of cultural appropriateness. The five mechanisms we identified—preference encoding mismatch, trust signal calibration error, quality assessment misalignment, information processing style mismatch, and risk tolerance miscalibration—operate at scale across billions of transactions.

Fourth, market dynamics amplify rather than correct this bias. Winner-takes-most effects in AI systems mean that early market leaders establish dominant positions that become difficult to displace even when their cultural calibration is poor for most global markets (Parker et al., 2016). Platform convergence means multiple shopping agents built on similar WEIRD-biased models will make similar inappropriate recommendations, creating the illusion of choice while maintaining underlying uniformity (Ezrachi & Stucke, 2016).

Fifth, policy and technical solutions exist but require recognition of the problem. Cultural auditing requirements, regional fine-tuning mandates, culturally-diverse training data, cultural prompting as default, multi-model ensembles, and local validation mechanisms can all reduce bias—but only if developers and regulators recognize cultural fairness as a design requirement rather than an optional enhancement (European Union, 2024; UNESCO, 2021).

8.2 Theoretical Contributions

This paper makes several theoretical contributions to the intersection of cultural psychology, AI ethics, and marketing:

We introduce Cultural Commerce Fairness as a distinct dimension of algorithmic fairness, extending existing frameworks (Mehrabi et al., 2021; Mitchell et al., 2021) to incorporate cultural distance and psychological misalignment as sources of systematic disadvantage. This framework provides conceptual tools for analyzing fairness in contexts where different populations have fundamentally different values and decision-making processes.

We develop a five-mechanism model explaining how cultural bias in LLMs manifests in commercial harms: preference encoding mismatch, trust signal calibration error, quality assessment misalignment, information processing style mismatch, and risk tolerance miscalibration. This mechanistic analysis enables targeted interventions rather than treating cultural bias as monolithic.

We propose five complementary fairness metrics—cultural parity, cultural alignment, preference satisfaction, information quality perception, and cultural distance correlation—providing operationalizable criteria for assessing and improving cultural appropriateness in AI systems.

We synthesize three previously disconnected research streams—cultural psychology (Henrich et al., 2010; Nisbett, 2003), AI bias research (Atari et al., 2023; Mehrabi et al., 2021), and agentic commerce (Gal & Elkin-Koren, 2017; Brynjolfsson & McAfee, 2014)—demonstrating that their integration reveals systematic patterns invisible when examined separately.

8.3 Practical Implications

For practitioners, this research provides actionable guidance:

For AI Developers: Cultural bias is not inevitable but stems from correctable design choices. Training data geography, evaluator demographics, and implicit universalist assumptions all contribute. Targeted interventions—diverse training data, cultural prompting, multi-model ensembles, local validation—can reduce bias if prioritized during development rather than treated as afterthought.

For E-Commerce Platforms: Deploying culturally-biased shopping agents creates legal, reputational, and competitive risks. Platforms should conduct cultural audits before deployment, implement ongoing monitoring across cultural groups, provide cultural transparency to users, and invest in regional fine-tuning for major markets.

For Policymakers: Current AI governance frameworks inadequately address cross-cultural bias. Specific requirements needed: mandatory cultural auditing for systems above transaction thresholds, regional fine-tuning requirements for global deployment, cultural fairness standards for commercial AI, and right to cultural alignment for consumers.

For Researchers: The empirical research program outlined in Section 7 requires execution. Our field needs systematic measurement of cultural bias across models, validation of mitigation strategies, assessment of real-world impacts, and development of best practices for culturally-aware AI development.

8.4 Limitations and Future Directions

This paper is primarily theoretical and conceptual. While grounded in extensive empirical evidence from cultural psychology (Henrich et al., 2010; Nisbett, 2003) and recent LLM bias studies (Atari et al., 2023; Tao et al., 2024), our application to agentic commerce remains largely untested. The empirical research program in Section 7 will validate or refute our theoretical predictions.

Our analysis focuses on WEIRD versus non-WEIRD distinctions, potentially oversimplifying cultural diversity. Future work should examine variation within these broad categories, including differences between East Asian, Latin American, African, and Middle Eastern populations. Cultural distance metrics (Muthukrishna et al., 2020) provide tools for fine-grained analysis.

We focus on cultural bias in product recommendations but agentic commerce encompasses price negotiation, seller selection, contract formation, and dispute resolution. Each domain may exhibit distinct bias patterns requiring separate analysis.

Our proposed fairness metrics require validation through user studies demonstrating they capture consumer experiences of cultural appropriateness versus inappropriateness. Metric development should involve participatory processes with stakeholders from diverse cultural backgrounds.

8.5 A Path Forward

Our forthcoming empirical research will test whether these concerns are theoretical speculation or documented reality. If Hypothesis 1 is confirmed—that contemporary US-trained LLMs exhibit WEIRD bias comparable to that found in GPT-3 and GPT-4—the urgency of reform becomes clear. If Hypothesis 2 is confirmed—that Chinese-trained models show reversed bias toward East Asian populations—we gain crucial evidence that training data geography determines cultural psychology, suggesting targeted mitigation strategies. If Hypothesis 3 is confirmed—that these patterns persist across diverse architectures and capabilities—we establish that cultural bias is a fundamental rather than incidental feature of current approaches.

Should the empirical evidence validate our framework, the implications extend beyond commerce to every domain where AI systems make decisions on behalf of humans: education (Baker & Hawn, 2022), healthcare (Obermeyer et al., 2019), employment (Lambrecht & Tucker, 2019), finance (Fuster et al., 2022), governance (Eubanks, 2018). If we build AI systems that encode one culture’s psychology as universal, we risk a future where algorithmic decision-making systematically serves the interests of already-advantaged populations while marginalizing the global majority.

The alternative is achievable. AI systems that recognize human cultural diversity, that adapt to local contexts, that respect different ways of thinking about value, risk, quality, and trust—these are technically feasible. What they require is recognition that “Which humans?” (Atari et al., 2023) is not a peripheral question but a central design challenge. When we build AI shopping agents, we must ask: Which humans are we building for? Which humans’ psychology are we encoding? Which humans will benefit, and which will be disadvantaged?

8.6 The Window Is Closing

The agentic commerce revolution is not inevitable in its current form. The choices we make today—about training data, model design, deployment practices, governance frameworks—will determine whether AI shopping agents serve as tools of inclusion or instruments of cultural hegemony. This paper has attempted to demonstrate that cultural fairness in algorithmic commerce is not merely an ethical aspiration but an empirical question amenable to rigorous investigation and a design challenge with practical solutions.

As AI systems become more capable, more autonomous, and more deeply embedded in economic life, the stakes of these choices will only grow. We have a narrow window—perhaps 18-24 months—before agentic commerce architectures solidify and path dependencies emerge. The research program outlined in Section 7 aims to provide the evidence needed to inform these critical design choices before they become locked in.

The question “Which humans?” applies not only to understanding current biases but to imagining alternative futures. We can build AI systems that work for everyone, not just WEIRD populations. We can design shopping agents that respect cultural diversity rather than imposing cultural uniformity. We can create commercial algorithms that expand rather than constrain human possibility.

But only if we recognize the problem, measure its magnitude, and commit to solving it. This paper is offered as a contribution to that effort—a call to take seriously the cultural dimensions of AI commerce before the opportunity to intervene passes. The 85% of humanity currently disadvantaged by WEIRD-biased systems deserve nothing less.

The choice before us is clear: we can build a global digital commerce infrastructure that reflects the full diversity of human psychology and serves all populations equitably, or we can allow the accidental biases of current systems to calcify into permanent structural inequities. The technology for the former exists. What remains is the will to implement it.

REFERENCES

Aaker, J. L., & Maheswaran, D. (1997). The effect of cultural orientation on persuasion. Journal of Consumer Research, 24(3), 315-328. https://doi.org/10.1086/209513

Aaker, J. L., & Schmitt, B. (2001). Culture-dependent assimilation and differentiation of the self: Preferences for consumption symbols in the United States and China. Journal of Cross-Cultural Psychology, 32(5), 561-576. https://doi.org/10.1177/0022022101032005003

Abdurahman, S., Atari, M., Karimi-Malekabadi, F., Xue, M. J., Trager, J., Park, P. S., Golazizian, P., Omrani, A., & Dehghani, M. (2023). Perils and opportunities in using large language models in psychological research. PsyArXiv preprint. https://doi.org/10.31234/osf.io/d695y

Anil, R., Dai, A. M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., … & Wu, Y. (2023). PaLM 2 technical report. arXiv preprint arXiv:2305.10403.

Aridor, G., Jiménez-Durán, R., Jimenez Gomez, R. A., & Ros­si, L. (2021). The economics of algorithmic pricing. Annual Review of Economics, 13, 727-752. https://doi.org/10.1146/annurev-economics-082020-032337

Atari, M., Xue, M. J., Park, P. S., Blasi, D. E., & Henrich, J. (2023). Which humans? PsyArXiv preprint. https://doi.org/10.31234/osf.io/5b26t

Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., … & Rahwan, I. (2018). The Moral Machine experiment. Nature, 563(7729), 59-64. https://doi.org/10.1038/s41586-018-0637-6

Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., … & Kaplan, J. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073.

Baker, R. S., & Hawn, A. (2022). Algorithmic bias in education. International Journal of Artificial Intelligence in Education, 32(4), 1052-1092. https://doi.org/10.1007/s40593-021-00285-9

Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and machine learning: Limitations and opportunities. MIT Press.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610-623). https://doi.org/10.1145/3442188.3445922

Berg, J., Dickhaut, J., & McCabe, K. (1995). Trust, reciprocity, and social history. Games and Economic Behavior, 10(1), 122-142. https://doi.org/10.1006/game.1995.1027

Bietti, E. (2020). From ethics washing to ethics bashing: A view on tech ethics from within moral philosophy. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 210-219). https://doi.org/10.1145/3351095.3372860

Binns, R. (2018). Fairness in machine learning: Lessons from political philosophy. In Proceedings of the 2018 Conference on Fairness, Accountability, and Transparency (pp. 149-159). https://doi.org/10.1145/3287560.3287598

Blasi, D. E., Henrich, J., Adamou, E., Kemmerer, D., & Majid, A. (2022). Over-reliance on English hinders cognitive science. Trends in Cognitive Sciences, 26(12), 1153-1170. https://doi.org/10.1016/j.tics.2022.09.015

Brynjolfsson, E., Hu, Y., & Smith, M. D. (2011). Research commentary—Consumer surplus in the digital economy: Estimating the value of increased product variety at online booksellers. Management Science, 49(11), 1580-1596. https://doi.org/10.1287/mnsc.49.11.1580.20583

Brynjolfsson, E., & McAfee, A. (2014). The second machine age: Work, progress, and prosperity in a time of brilliant technologies. W. W. Norton & Company.

Burke, R., Felfernig, A., & Göker, M. H. (2011). Recommender systems: An overview. AI Magazine, 32(3), 13-18. https://doi.org/10.1609/aimag.v32i3.2361

Calvano, E., Calzolari, G., Denicolò, V., & Pastorello, S. (2020). Artificial intelligence, algorithmic pricing, and collusion. American Economic Review, 110(10), 3267-3297. https://doi.org/10.1257/aer.20190623

Chen, L., Mislove, A., & Wilson, C. (2016). An empirical analysis of algorithmic pricing on Amazon marketplace. In Proceedings of the 25th International Conference on World Wide Web (pp. 1339-1349). https://doi.org/10.1145/2872427.2883089

Chiu, C. Y., & Hong, Y. Y. (2006). Social psychology of culture. Psychology Press.

Chouldechova, A., & Roth, A. (2020). A snapshot of the frontiers of fairness in machine learning. Communications of the ACM, 63(5), 82-89. https://doi.org/10.1145/3376898

Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems (pp. 4299-4307).

Corbett-Davies, S., & Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023. https://doi.org/10.48550/arXiv.1808.00023

Cross, S. E., Hardin, E. E., & Gercek-Swing, B. (2011). The what, how, why, and where of self-construal. Personality and Social Psychology Review, 15(2), 142-179. https://doi.org/10.1177/1088868310373752

Davenport, T. H., & Ronanki, R. (2018). Artificial intelligence for the real world. Harvard Business Review, 96(1), 108-116.

Denton, E., Hanna, A., Amironesei, R., Smart, A., & Nicole, H. (2020). Bringing the people back in: Contesting benchmark machine learning datasets. arXiv preprint arXiv:2007.07399.

Doney, P. M., Cannon, J. P., & Mullen, M. R. (1998). Understanding the influence of national culture on the development of trust. Academy of Management Review, 23(3), 601-620. https://doi.org/10.5465/amr.1998.926629

Doran, K. B. (2002). Lessons learned in cross-cultural research of Chinese and North American consumers. Journal of Business Research, 55(10), 823-829. https://doi.org/10.1016/S0148-2963(00)00222-8

Ekstrand, M. D., Das, A., Burke, R., & Diaz, F. (2023). Fairness in recommender systems. In F. Ricci, L. Rokach, & B. Shapira (Eds.), Recommender systems handbook (pp. 679-707). Springer. https://doi.org/10.1007/978-1-0716-2197-4_18

Escalas, J. E., & Bettman, J. R. (2005). Self-construal, reference groups, and brand meaning. Journal of Consumer Research, 32(3), 378-389. https://doi.org/10.1086/497549

Eubanks, V. (2018). Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press.

European Union. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council on artificial intelligence (AI Act). Official Journal of the European Union. Retrieved from https://eur-lex.europa.eu/

Ezrachi, A., & Stucke, M. E. (2016). Virtual competition: The promise and perils of the algorithm-driven economy. Harvard University Press.

Falk, A., Becker, A., Dohmen, T., Enke, B., Huffman, D., & Sunde, U. (2018). Global evidence on economic preferences. The Quarterly Journal of Economics, 133(4), 1645-1692. https://doi.org/10.1093/qje/qjy013

Farrell, J., & Klemperer, P. (2007). Coordination and lock-in: Competition with switching costs and network effects. Handbook of Industrial Organization, 3, 1967-2072. https://doi.org/10.1016/S1573-448X(06)03031-7

Forsythe, R., Horowitz, J. L., Savin, N. E., & Sefton, M. (1994). Fairness in simple bargaining experiments. Games and Economic Behavior, 6(3), 347-369. https://doi.org/10.1006/game.1994.1021

Fountaine, T., McCarthy, B., & Saleh, T. (2019). Building the AI-powered organization. Harvard Business Review, 97(4), 62-73.

Friedler, S. A., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E. P., & Roth, D. (2021). A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 329-338). https://doi.org/10.1145/3287560.3287589

Fukuyama, F. (1995). Trust: The social virtues and the creation of prosperity. Free Press.

Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T., & Walther, A. (2022). Predictably unequal? The effects of machine learning on credit markets. Journal of Finance, 77(1), 5-47. https://doi.org/10.1111/jofi.13090

Gal, M. S., & Elkin-Koren, N. (2017). Algorithmic consumers. Harvard Journal of Law & Technology, 30(2), 309-353.

Haerpfer, C., Inglehart, R., Moreno, A., Welzel, C., Kizilova, K., Diez-Medrano, J., Lagos, M., Norris, P., Ponarin, E., & Puranen, B. (Eds.). (2022). World Values Survey: Round Seven - Country-Pooled Datafile Version 6.0. Madrid, Spain & Vienna, Austria: JD Systems Institute & WVSA Secretariat. https://doi.org/10.14281/18241.1

Heine, S. J. (2016). Cultural psychology (3rd ed.). W. W. Norton & Company.

Henrich, J. (2020). The WEIRDest people in the world: How the West became psychologically peculiar and particularly prosperous. Farrar, Straus and Giroux.

Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H., … & Tracer, D. (2005). “Economic man” in cross-cultural perspective: Behavioral experiments in 15 small-scale societies. Behavioral and Brain Sciences, 28(6), 795-815. https://doi.org/10.1017/S0140525X05000142

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61-83. https://doi.org/10.1017/S0140525X0999152X

Hofstede, G. (2001). Culture’s consequences: Comparing values, behaviors, institutions and organizations across nations (2nd ed.). Sage Publications.

Hu, K. (2024, February 7). ChatGPT sets record for fastest-growing user base. Reuters. Retrieved from https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/

Inglehart, R., & Baker, W. E. (2000). Modernization, cultural change, and the persistence of traditional values. American Sociological Review, 65(1), 19-51. https://doi.org/10.2307/2657288

Inglehart, R., & Welzel, C. (2005). Modernization, cultural change, and democracy: The human development sequence. Cambridge University Press.

Jarvenpaa, S. L., Tractinsky, N., & Saarinen, L. (2000). Consumer trust in an Internet store: A cross-cultural validation. Journal of Computer-Mediated Communication, 5(2). https://doi.org/10.1111/j.1083-6101.1999.tb00337.x

Ji, L. J., Peng, K., & Nisbett, R. E. (2000). Culture, control, and perception of relationships in the environment. Journal of Personality and Social Psychology, 78(5), 943-955. https://doi.org/10.1037/0022-3514.78.5.943

Ji, L. J., Zhang, Z., & Nisbett, R. E. (2004). Is it culture or is it language? Examination of language effects in cross-cultural research on categorization. Journal of Personality and Social Psychology, 87(1), 57-65. https://doi.org/10.1037/0022-3514.87.1.57

Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389-399. https://doi.org/10.1038/s42256-019-0088-2

Johnson, R. L., Pistilli, G., Menédez-González, N., Dias Duran, L. D., Panai, E., Kalpokiene, J., & Bertulfo, D. J. (2022). The ghost in the machine has an American accent: Value conflict in GPT-3. arXiv preprint arXiv:2203.07785. https://doi.org/10.48550/arXiv.2203.07785

Kalluri, P. (2020). Don’t ask if artificial intelligence is good or fair, ask how it shifts power. Nature, 583(7815), 169-169. https://doi.org/10.1038/d41586-020-02003-2

Khan, L. M. (2017). Amazon’s antitrust paradox. Yale Law Journal, 126, 710-805.

Kim, H., & Markus, H. R. (1999). Deviance or uniqueness, harmony or conformity? A cultural analysis. Journal of Personality and Social Psychology, 77(4), 785-800. https://doi.org/10.1037/0022-3514.77.4.785

Kitayama, S., Park, H., Sevincer, A. T., Karasawa, M., & Uskul, A. K. (2009). A cultural task analysis of implicit independence: Comparing North America, Western Europe, and East Asia. Journal of Personality and Social Psychology, 97(2), 236-255. https://doi.org/10.1037/a0015999

Kusner, M. J., Loftus, J., Russell, C., & Silva, R. (2017). Counterfactual fairness. In Advances in Neural Information Processing Systems (pp. 4066-4076).

Lambrecht, A., & Tucker, C. E. (2019). Algorithmic bias? An empirical study of apparent gender-based discrimination in the display of STEM career ads. Management Science, 65(7), 2966-2981. https://doi.org/10.1287/mnsc.2018.3093

Lee, C., & Green, R. T. (1991). Cross-cultural examination of the Fishbein behavioral intentions model. Journal of International Business Studies, 22(2), 289-305. https://doi.org/10.1057/palgrave.jibs.8490304

Markus, H. R., & Kitayama, S. (1991). Culture and the self: Implications for cognition, emotion, and motivation. Psychological Review, 98(2), 224-253. https://doi.org/10.1037/0033-295X.98.2.224

Markus, H. R., & Kitayama, S. (2010). Cultures and selves: A cycle of mutual constitution. Perspectives on Psychological Science, 5(4), 420-430. https://doi.org/10.1177/1745691610375557

Masuda, T., & Nisbett, R. E. (2001). Attending holistically versus analytically: Comparing the context sensitivity of Japanese and Americans. Journal of Personality and Social Psychology, 81(5), 922-934. https://doi.org/10.1037/0022-3514.81.5.922

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1-35. https://doi.org/10.1145/3457607

Mitchell, S., Potash, E., Barocas, S., D’Amour, A., & Lum, K. (2021). Algorithmic fairness: Choices, assumptions, and definitions. Annual Review of Statistics and Its Application, 8, 141-163. https://doi.org/10.1146/annurev-statistics-042720-125902

Muthukrishna, M., Bell, A. V., Henrich, J., Curtin, C. M., Gedranovich, A., McInerney, J., & Thue, B. (2020). Beyond Western, Educated, Industrial, Rich, and Democratic (WEIRD) psychology: Measuring and mapping scales of cultural and psychological distance. Psychological Science, 31(6), 678-701. https://doi.org/10.1177/0956797620916782

Nisbett, R. E. (2003). The geography of thought: How Asians and Westerners think differently… and why. Free Press.

Nisbett, R. E., & Miyamoto, Y. (2005). The influence of culture: Holistic versus analytic perception. Trends in Cognitive Sciences, 9(10), 467-473. https://doi.org/10.1016/j.tics.2005.08.004

Nisbett, R. E., Peng, K., Choi, I., & Norenzayan, A. (2001). Culture and systems of thought: Holistic versus analytic cognition. Psychological Review, 108(2), 291-310. https://doi.org/10.1037/0033-295X.108.2.291

Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. NYU Press.

O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown.

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453. https://doi.org/10.1126/science.aax2342

OpenAI. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., … & Lowe, R. (2022). Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems (pp. 27730-27744).

Oyserman, D., Coon, H. M., & Kemmelmeier, M. (2002). Rethinking individualism and collectivism: Evaluation of theoretical assumptions and meta-analyses. Psychological Bulletin, 128(1), 3-72. https://doi.org/10.1037/0033-2909.128.1.3

Parker, G. G., Van Alstyne, M. W., & Choudary, S. P. (2016). Platform revolution: How networked markets are transforming the economy and how to make them work for you. W. W. Norton & Company.

Peng, K., & Nisbett, R. E. (1999). Culture, dialectics, and reasoning about contradiction. American Psychologist, 54(9), 741-754. https://doi.org/10.1037/0003-066X.54.9.741

Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., … & Barnes, P. (2020). Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 33-44). https://doi.org/10.1145/3351095.3372873

Rialti, R., Zollo, L., Ferraris, A., & Alon, I. (2021). Big data analytics capabilities and performance: Evidence from a moderated multi-mediation model. Technological Forecasting and Social Change, 173, 121110. https://doi.org/10.1016/j.techfore.2021.121110

Sap, M., Swayamdipta, S., Vianna, L., Zhou, X., Choi, Y., & Smith, N. A. (2022). Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (pp. 5884-5906). https://doi.org/10.18653/v1/2022.naacl-main.431

Schwartz, S. H. (2006). A theory of cultural value orientations: Explication and applications. Comparative Sociology, 5(2-3), 137-182. https://doi.org/10.1163/156913306778667357

Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 59-68). https://doi.org/10.1145/3287560.3287598

Shapiro, C., & Varian, H. R. (1999). Information rules: A strategic guide to the network economy. Harvard Business Press.

Steenkamp, J. B. E. (2001). The role of national culture in international marketing research. International Marketing Review, 18(1), 30-44. https://doi.org/10.1108/02651330110381970

Sweeney, L. (2013). Discrimination in online ad delivery. Communications of the ACM, 56(5), 44-54. https://doi.org/10.1145/2447976.2447990

Talhelm, T., & Oishi, S. (2015). The psychological trade-offs of rice farming in China. In Advances in Culture and Psychology (Vol. 5, pp. 253-305). Oxford University Press.

Talhelm, T., Zhang, X., Oishi, S., Shimin, C., Duan, D., Lan, X., & Kitayama, S. (2014). Large-scale psychological differences within China explained by rice versus wheat agriculture. Science, 344(6184), 603-608. https://doi.org/10.1126/science.1246850

Talkdesk. (2024). AI in customer experience: 2024 survey results. Talkdesk Research. Retrieved from https://www.talkdesk.com/resources/

Tao, Y., Viberg, O., Baker, R. S., & Kizilcec, R. F. (2024). Cultural bias and cultural alignment of large language models. PNAS Nexus, 3(9), pgae346. https://doi.org/10.1093/pnasnexus/pgae346

Triandis, H. C. (1995). Individualism & collectivism. Westview Press.

Triandis, H. C. (2001). Individualism-collectivism and personality. Journal of Personality, 69(6), 907-924. https://doi.org/10.1111/1467-6494.696169

UNESCO. (2021). Recommendation on the ethics of artificial intelligence. Retrieved from https://www.unesco.org/en/artificial-intelligence/recommendation-ethics

Varian, H. R. (1992). Microeconomic analysis (3rd ed.). W. W. Norton & Company.

W3Techs. (2024). Usage statistics of content languages for websites. Retrieved from https://w3techs.com/technologies/overview/content_language

Weber, E. U., & Hsee, C. (1998). Cross-cultural differences in risk perception, but cross-cultural similarities in attitudes towards perceived risk. Management Science, 44(9), 1205-1217. https://doi.org/10.1287/mnsc.44.9.1205

Yamagishi, T., & Yamagishi, M. (1994). Trust and commitment in the United States and Japan. Motivation and Emotion, 18(2), 129-166. https://doi.org/10.1007/BF02249397

Zhang, Y., & Shavitt, S. (2003). Cultural values in advertisements to the Chinese X-generation: Promoting modernity and individualism. Journal of Advertising, 32(1), 23-33. https://doi.org/10.1080/00913367.2003.10639120

ACKNOWLEDGMENTS:

This research builds on foundational work in cultural psychology, particularly the WEIRD framework developed by Henrich and colleagues (2010), cross-cultural cognition research by Nisbett (2003), and self-construal theory by Markus and Kitayama (1991). Drawing on cultural psychology and recent empirical findings that LLMs exhibit WEIRD psychological profiles (Atari et al., 2023; Tao et al., 2024). All errors and interpretations remain the author's responsibility.

Author Note & Declarations

Working Paper Declaration:

This working paper is distributed via SSRN. It has not been peer-reviewed (as at the date of posting on this website) and should not be cited as a final, published article. This working paper establishes a theoretical framework for understanding agentic commerce—an emerging phenomenon with significant implications for marketing theory and commercial practice. By releasing this paper as a working paper, the author seeks to establish theoretical priority on this topic while inviting scholarly dialogue and collaboration.

Provenance Statement:

This paper represents independent academic research conducted through The AI Praxis and is derived from the author's forthcoming book 'The Algorithmic Shopper' (U.S. Copyright Office Reg. No. TXu 2-507-027), under contract with St. Martin's Press/Macmillan (expected publication Q4 2026/Q1 2027), combined with 25+ years of global commercial leadership experience across multiple organisations and markets.

Original Theoretical Contributions:

The Agentic Commerce theoretical constructs presented herein—including The Shopper Schism, Agent Intent Optimisation (AIO), The Trust Paradox, The Great Decoupling, Algorithmic Readiness, and related frameworks—represent original intellectual property developed through the author's independent research programme. Publication priority for these constructs is established through SSRN working papers (ssrn.com/author=8182896). The pedagogical framework, including the Pracademic Method and modular curriculum architecture, represents original contribution to management education scholarship.

AI Usage Statement:

The author acknowledges the use of AI assistance in research support, literature organisation, and editing some elements of this working paper. All concepts, frameworks, and theoretical contributions remain the original intellectual work of the author, who takes full responsibility for the content and conclusions presented herein.

Correspondence & Copyright

Paul F. Accornero, The AI Praxis. Email: paul.accornero@gmail.com | ORCID: https://orcid.org/0009-0009-2567-5155

Copyright © 2026 Paul F. Accornero. All rights reserved. This working paper is the intellectual property of the author. It may be downloaded, printed, and distributed for personal research or educational purposes only. Commercial use or redistribution without the author's explicit written permission is prohibited.

Research portfolio derived from The Algorithmic Shopper (U.S. Copyright Reg. No. TXu 2-507-027)