AI research summary “Exaggeration of findings”, research warns

AI tools exaggerate their findings much more frequently than humans. This is when research suggesting that the latest bots are the worst offenders, specifically directed not to exaggerate.

Researchers from the Netherlands and the UK have found that AI summaries of scientific papers are far more likely to “disseminate” results than original authors and expert reviewers.

The analysis reported in the Journal Royal Society Open Science suggests that AI summary, which is said to help spread scientific knowledge by spreading scientific knowledge in a “easy to understand language,” suggests that there is a tendency to ignore “uncertainty, limitations, and nuances” by “omitting qualifications” and “overloading” texts.

This is particularly “dangerous” when applied to medical research, the report warns. “If chatbots create summaries that overlook the generalizability of clinical trial results (approximately) then practitioners who rely on these chatbots may specify unsafe or inappropriate treatment.”

The team analyzed a summary of 200 magazines and almost 5,000 AIs of 100 complete articles. The topics spanned the effects of caffeine on irregular heartbeat and the benefits of bariatric surgery in reducing the risk of cancer, the effects of residents’ behavior and government communication, and the effects of disinformation and government communication on people’s beliefs about climate change.

Summary created by “old” AI apps such as Openai’s GPT-4 and Meta’s Llama 2, released in 2023, proved that the original abstract is about 2.6 times more likely to include generalized conclusions.

The generalization potential increased nine times in summary by ChATGPT-4O released last May, and 39 times by Llama 3.3, which appeared in December.

The instructions to “stay faithful to the source material” and “not introduce inaccuracies” produced opposite effects. This proved that it is likely to include generalized conclusions about twice as much as the conclusions generated when the bot was simply asked to “provide a summary of the key findings.”

This suggests that generative AI may be vulnerable to “ironic rebound” effects. We automatically pulled out images of subjects who were not instructed to think of anything, such as “pink elephant.”

AI apps also seemed prone to failures such as “devastating forgetting.” There, new information revealed previously acquired knowledge and skills, showing “unfair confidence,” where “adventurity” was prioritized over “care and accuracy.”

The author speculates that fine-tuning the bot can exacerbate these issues. When AI apps are “optimized for utility,” they are less likely to “express uncertainty about questions beyond parametric knowledge.” “Tools that provide highly accurate but complex answers can receive lower ratings from human raters,” the paper explains.

One summary cited in the paper reinterpreted the finding that diabetic drugs are “better than placebo” as approval of the “effective and safe treatment” option. “That kind of… generalization can mislead practitioners and use dangerous interventions,” the paper states.

The AI summary provides five strategies to “mitigate” the risk of overgeneralization. These include using bots from the “Claude” family of humanity in AI companies. This turns out to create the “most faithful” overview.

Another recommendation is to lower the “temperature” setting for your bot. Temperature is an tunable parameter that controls the randomness of the generated text.

Uwe Peters, an assistant professor of theory philosophy at Utrecht University and co-author of the report, said the excess “occurred frequently and systematically.”

He said the findings meant that even subtle changes in AI findings were at the risk of “misleading users and amplifying misinformation, especially when the output appears to be polished and reliable.”

Tech companies need to assess the models for such trends, he adds, and they need to share them openly. The university demonstrated the “urgent need for stronger AI literacy” between staff and students.

John.ross@timeshighereducation.com

versatileai

See Full Bio

What's Hot

Creating innovative content at your fingertips

The UK and Singapore form an alliance to guide AI into finance

StarCoder2 and Stack V2

In the midst of intense AI talent races, Meta’s active recruitment target open-rai researcher

Lossless compression tailored to AI

High-tech research jobs in the US will rise by 26% by the next decade. Median future salary for AI, ML and others is $140,000

New Star: Discover why 보니 is the future of AI art

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

Presight plans to expand its AI business internationally

Most Popular