OpenScholar: Open source AI that outperforms GPT-4o in scientific research

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more

Scientists are drowning in data. With millions of research papers published each year, even the most dedicated experts struggle to stay up to date with the latest research findings in their field.

A new artificial intelligence system called OpenScholar promises to rewrite the rules of how researchers access, evaluate, and synthesize scientific literature. Built by the Allen Institute for AI (Ai2) and the University of Washington, OpenScholar combines a state-of-the-art search system and a fine-tuned language model to provide comprehensive, citation-backed answers to complex research questions. provide answers.

“Scientific progress depends on researchers’ ability to synthesize an ever-growing body of literature,” the OpenScholar researchers wrote in their paper. However, that ability is increasingly limited by the sheer volume of information. They argue that OpenScholar not only helps researchers navigate the high volume of papers, but also provides a path forward that challenges the dominance of proprietary AI systems like OpenAI’s GPT-4o.

How OpenScholar’s AI brain processes 45 million research papers in seconds

At the heart of OpenScholar is a search expansion language model that leverages a data store of over 45 million open access scholarly articles. When a researcher asks a question, OpenScholar doesn’t just generate a response from pre-trained knowledge, as models like GPT-4o do. Instead, it actively searches for relevant papers, synthesizes the results, and generates an answer based on those sources.

This ability to stay “rooted” in real literature is a huge differentiator. OpenScholar outperformed in tests using a new benchmark called ScholarQABench. This benchmark was specifically designed to evaluate AI systems on open-ended scientific questions. The system performed well in factuality and citation accuracy, even outperforming much larger proprietary models such as GPT-4o.

One particularly egregious finding concerned GPT-4o’s tendency to generate fabricated citations (hallucinations in AI parlance). When GPT-4o was tasked with answering biomedical research questions, it cited non-existent papers in over 90% of cases. In contrast, OpenScholar remained firmly anchored in verifiable sources.

It is basically based on papers that were actually searched. The system uses what the researchers call a “self-feedback inference loop,” which “iteratively refines the output through natural language feedback to improve quality and adaptively incorporate supplementary information.” .

The implications for researchers, policy makers, and business leaders are significant. OpenScholar has become an essential tool for accelerating scientific discovery, enabling experts to integrate knowledge faster and with more confidence.

How OpenScholar works: The system starts by searching 45 million research articles (left), then uses AI to retrieve and rank relevant passages, generate an initial response, and verify citations. refine the response through an iterative feedback loop before This process enables OpenScholar to provide accurate, citation-supported answers to complex scientific questions. |Source: Allen Institute for AI and University of Washington

Inside the David vs. Goliath battle: Can open source AI compete with Big Tech?

OpenScholar’s debut comes at a time when the AI ecosystem is increasingly dominated by closed, proprietary systems. Models like OpenAI’s GPT-4o and Anthropic’s Claude offer great functionality, but are expensive, opaque, and inaccessible to many researchers. OpenScholar flips this model on its head by being completely open source.

The OpenScholar team released not only the code for the language model, but also the entire search pipeline, a specialized 8 billion parameter model fine-tuned for scientific tasks, and a datastore of scientific papers. “To our knowledge, this is the first open release of Scientific Assistant LM’s complete pipeline from data to training recipes to model checkpoints,” the researchers said in a blog post announcing the system.

This openness is not just a philosophical position. It also has practical benefits. OpenScholar’s small size and streamlined architecture make it much more cost-effective than your own system. For example, researchers estimate that OpenScholar-8B is 100 times cheaper to operate than PaperQA2, a concurrent system built on GPT-4o.

This cost efficiency could make powerful AI tools accessible to small institutions, underfunded labs, and researchers in developing countries.

Still, OpenScholar is not without its limitations. Its datastore is limited to open access articles and excludes paywalled research that dominates some fields. Although legally required, this restriction means the system may miss important discoveries in fields such as medicine and engineering. The researchers acknowledge this gap and hope that future iterations can responsibly incorporate closed-access content.

- Versa AI hub — OpenScholar performance Expert evaluation shows that OpenScholar OS GPT4o and OS 8B outperforms both human experts and GPT 4o in four key metrics organization coverage relevance and usefulness It has been shown that they are competing favorably Notably both versions of OpenScholar were rated as more useful than human written responses |Source Allen Institute for AI and University of Washington

New scientific methods: When AI becomes your research partner

The OpenScholar project raises important questions about the role of AI in science. Although this system’s ability to synthesize literature is impressive, it is not foolproof. Expert ratings preferred OpenScholar answers over human-written answers 70% of the time, but the remaining 30% did not cite underlying papers or included unrepresentative research. This highlighted areas where the model was inadequate, such as selecting

These limitations highlight a broader truth. AI tools like OpenScholar are meant to augment human expertise, not replace it. The system is designed to help researchers handle the time-consuming task of literature synthesis, allowing them to focus on interpretation and advancing knowledge.

Critics say OpenScholar’s reliance on open-access articles limits its immediate usefulness in high-stakes fields such as pharmaceuticals, where much of the research is locked behind paywalls. I might point it out. Some argue that the performance of the system, although powerful, is still highly dependent on the quality of the data obtained. If the acquisition step fails, the entire pipeline risks producing suboptimal results.

But even with its limitations, OpenScholar represents a turning point in scientific computing. While previous AI models have impressed with their ability to engage in conversations, OpenScholar is demonstrating something more fundamental: the ability to process, understand, and synthesize scientific literature with near-human accuracy.

The numbers tell a compelling story. OpenScholar’s 8 billion parameter model performs better than GPT-4o, despite being orders of magnitude smaller. It rivals human experts in citation accuracy, where other AIs fail 90% of the time. And perhaps most importantly, experts prefer their answers to those written by their colleagues.

These results suggest that we are entering a new era of AI-assisted research. There, the bottleneck to scientific progress may no longer be the ability to process existing knowledge, but rather the ability to ask the right questions.

Researchers have released all their code, models, data, and tools, betting that making breakthroughs more open will accelerate progress rather than keeping them behind closed doors.

In doing so, they answered one of the most pressing questions in AI development: Can open source solutions compete with Big Tech’s black boxes?

The answer appears to be hidden in plain sight among 45 million papers.

VB Daily

Be sure to stay informed. Get the latest news in your inbox every day

By subscribing, you agree to VentureBeat’s Terms of Use.

Thank you for subscribing. Check out other VB newsletters here.

An error has occurred.

versatileai

See Full Bio

What's Hot

Reddit appeals to humanity over AI data scraping

Grassley discusses the AI whistleblower protection law in a “start point” interview

Piclumen Art V1: Next Generation AI Image Generation Model Launches for Digital Creators | Flash News Details

JMU Education Professor was awarded for AI Research

Intelligent Automation, Nvidia and Enterprise AI

Can AI be your therapist? New research reveals major risks

New Star: Discover why 보니 is the future of AI art

How to use Olympic coders locally for coding

Dell, IBM and HPE must operate at a single digit margin when it comes to the server market, and only gets worse

Most Popular