AI now beats lawyers on legal research tasks, new benchmarking study finds

A new study by benchmarking specialist Vals AI has found that AI tools now outperform lawyers on legal research tasks.
The report also found that legal specific tools are only slightly better at answering single-prompt research questions than ChatGPT.
AI tools now outperform lawyers on legal research tasks, according to a new study by benchmarking specialist Vals AI, which compared the performance of legal-specific and general-use AI tools against human lawyers.
The study tested legal AI tools Alexi, Counsel Stack and Midpage as well as ChatGPT on 200 legal research questions designed with input from US firms including Paul Weiss, McDermott, Reed Smith and Paul Hastings. Responses were then scored for accuracy, authoritativeness and clarity.
AI trumps humans
All AI tools performed better than the human lawyer baseline across all three criteria. The overall scores of the legal-specific tools ranged between 76-78%, while ChatGPT scored 74% - still above the lawyer baseline of 69%.
In terms of accuracy, both ChatGPT and the legal-specific tools achieved scores of around 80%, compared with 71% for lawyers, while legal-specific AIs were notably stronger on authoritativeness, scoring six points higher than ChatGPT on average thanks to their access to proprietary data (even if mainly composed of publicly available data) although the report noted this gap may narrow with the increasing availability of OpenAI’s Deep Research mode.
Still, despite their underperformance overall, the study found human lawyers continue to hold an edge where deeper contextual understanding, nuanced judgement and multi-layered reasoning are required.
Vals says it calculated its human baseline figure by asking "lawyers experienced in conducting legal research for client matters" from an unnamed US law firm to answer the survey questions without the use of generative AI.
Participation gap
Notably, several of the best known legal AI companies did not take part in the study, despite having been involved in Vals’ first report earlier this year exploring AI’s effectiveness in non-research legal tasks such as document summarisation and redlining.
Its chosen methodology may have been a factor here. The study relied on "zero-shot" prompts - single questions posed without follow-up or extra context - and didn’t make use of any AI tools’ workflow-based features. Vals itself acknowledged this setup "is not necessarily true to life" compared with how a lawyer or researcher would typically approach a legal research question.
That design may also have favoured general-purpose AIs over specialist legal platforms, which are often fine-tuned for multi-step reasoning and workflow-style interactions that more closely mirror how real lawyers work.
Join 10,000+ City law professionals who start their day with our newsletter.
The essential read for commercially aware lawyers.