Header

Search

New Publication: «Source framing triggers systematic bias in large language models.»

We are pleased to share that our colleagues Federico Germani and Giovanni Spitale, co-directors of ITE Lab, have just published a new paper in Science Advances titled “Source framing triggers systematic bias in large language models.” The study investigates how four cutting-edge large language models – including Grok, Mistral, DeepSeek, and OpenAI’s o3-mini – evaluate statements on sensitive topics such as public health, geopolitics, and human rights. Across 192,000 assessments, the authors show that these models generally agree with one another when they do not know who authored the content, challenging the “AI war of ideologies” often discussed in the media. However, when a statement is attributed to “a person from China,” all models – including DeepSeek, developed in China – tend to rate it significantly lower, regardless of its content. This consistent pattern of “source framing” bias raises important questions about fairness and neutrality in AI systems, especially as they increasingly participate in tasks like content moderation and automated evaluation. The full dataset, code, and analysis pipeline are openly available on OSF, reflecting the authors’ commitment to transparency and Open Science practices – beyond mere open access.
 
DOI
Abstract:
Large language models (LLMs) are increasingly used to evaluate text, raising urgent questions about whether their judgments are consistent, unbiased, and robust to framing effects. Here, we examine inter- and intramodel agreement across four state-of-the-art LLMs tasked with evaluating 4800 narrative statements on 24 different topics of social, political, and public health relevance, for a total of 192,000 assessments. We manipulate the disclosed source of each statement to assess how attribution to either another LLM or a human author of specified nationality affects evaluation outcomes. Different LLMs display a remarkably high degree of inter- and intramodel agreement across topics, but this alignment breaks down when source framing is introduced. Attributing statements to Chinese individuals systematically lowers agreement scores across all models and, in particular, for DeepSeek Reasoner. Our findings show that LLMs’ own judgment of agreement with narrative statements exhibit systematic bias from framing effects, with substantial implications for the neutrality and fairness of LLM-mediated information systems.

Subpages