Researchers from the University of Washington (UW) and the Allen Institute for AI (Ai2) have developed OpenScholar, an open-source AI model specialized in synthesizing current scientific research and providing verifiable citations.
OpenScholar uses retrieval-augmented generation (RAG) to search a large corpus of over 45 million open-access papers, identify relevant passages, and generate responses grounded in those sources. This approach minimizes hallucinations common in general-purpose models like GPT-4o, which fabricated 78–90% of citations in tests.
The model was evaluated on ScholarQABench, a new multi-domain benchmark with 2,967 expert-written open-ended scientific questions across computer science, physics, neuroscience, and biomedicine. OpenScholar-8B outperformed GPT-4o by 6.1% and PaperQA2 by 5.5% in correctness. Citation accuracy matched that of human experts.
In human evaluations by scientists, OpenScholar responses were preferred over those written by subject experts in 51% of cases. When combined with GPT-4o, preference rose to 70%, compared to 32% for GPT-4o alone.
The findings were published in Nature on February 4, 2026 (doi:10.1038/s41586-025-10072-4). Code, model weights, data store, datasets, and a public demo are freely available.
The team is developing a follow-up model, DR Tulu, which builds on OpenScholar by enabling multi-step search, planning, and synthesis for more comprehensive long-form research responses.
OpenScholar addresses key limitations of existing AI tools in scientific contexts, offering transparency, reproducibility, and reliability to accelerate research without compromising accuracy.
