Key Point 1:
In AI reading tests, Claude secured the top position with a stable performance free of “hallucinations,” followed closely by ChatGPT. However, the overall AI scores were relatively low.
Key Point 2:
The understanding capabilities of various AI systems vary significantly across different fields such as literature, law, science, and politics, with differing performances.
Key Point 3:
Experts believe that AI currently cannot replace human reading, especially in handling important documents, and can only serve as an auxiliary tool.
Which AI is the Best Reader?
Fast forward to 2025, generative AI has introduced numerous features focused on data integration, such as Google’s Notebook LM and various Deep Research functionalities, all relying on the AI model’s “reading ability” and reasoning skills after inputting data.
Regarding the reading abilities of the current five mainstream AI models, the Washington Post test results indicate that Claude, developed by Anthropic, performed the best, scoring the highest overall and being the only AI that did not exhibit “hallucinations” (where AI fabricates information), while ChatGPT came in second.
In conclusion, regardless of the score ratings, the Washington Post testing results reveal that current AIs still have significant shortcomings in deep understanding and analysis, with an overall average score of only about 70%, equivalent to a D+ in academic grading, indicating substantial room for improvement in AI reading comprehension.
AI Strengths in Reading: Claude Excels in Law, ChatGPT in Literature
The Washington Post assessed five AIs: Claude, ChatGPT, Copilot, Meta AI, and Google’s Gemini. The testing scope included literature novels, legal contracts, medical research, and political speeches, with blind evaluations conducted by experts in each field. The results are as follows:
Literature Field: ChatGPT 7.8; Claude 7.3; Meta AI 4.3; Copilot 3.5; Gemini 2.3.
Legal Field: Claude 6.9; Gemini 6.1; Copilot 5.4; ChatGPT 5.3; Meta AI 2.6.
Health Science Field: Claude 7.7; ChatGPT 7.2; Copilot 7; Gemini 6.5; Meta AI 6.
Political Field: ChatGPT 7.2; Claude 6.2; Meta AI 5.2; Gemini 5; Copilot 3.7.
Overall scores are as follows:
Claude: 69.9
ChatGPT: 68.4
Gemini: 49.7
Copilot: 49
Meta AI: 45
In summary, Claude narrowly outperformed ChatGPT, while Gemini, Copilot, and Meta AI scored below 50. Notably, Claude was the only AI that did not generate any hallucinations.
The documents tested included the novel The Jackal’s Mistress in the literature category, medical papers on COVID-19 and Parkinson’s disease in health, a leasing agreement and construction contract in law, and Trump’s speech documents in politics.
The results demonstrate significant discrepancies in AI performance across various professional fields. For instance, ChatGPT performed better in literature and political categories but lagged in understanding legal documents, whereas Claude achieved the highest scores in law and health science.
However, even the best-performing Claude did not score top marks in literature, and Gemini’s performance in literary comprehension was criticized as “inaccurate, misleading, and hasty,” with a sense of attempting to gloss over its shortcomings.
It is worth noting that all four AIs, except for Claude, displayed varying degrees of information fabrication during the testing. This confirms that AI’s ability to read long texts remains limited, leading to frequent omissions of important information in generated summaries or an overemphasis on positive content while neglecting negative details.
Note 1: The original testing period was from April to May 2025, using AI versions: ChatGPT-4o, Gemini 2.0 Flash, Claude 3 Sonnet, Llama 4, Copilot for Microsoft 365.
Note 2: Reviewers scored each AI answer on a scale of 10, with scores in each academic field being the average of all ratings. The total score was equally weighted across the four academic fields and presented on a scale of 100.
Expert Summary: AI Cannot Replace Human Reading
Despite some AIs demonstrating impressive capabilities in specific analytical tasks, such as ChatGPT summarizing novels and Claude’s suggestions for revising legal documents or insights for medical papers, experts remain cautious about the current reading comprehension abilities of AI.
For example, corporate lawyer Sterling Miller, who participated in the review, pointed out that AI’s performance in handling legal documents is not stable enough to replace professional lawyers; novelist Chris Bohjalian noted that AI’s responses sometimes resemble “robots wearing human masks,” pretending to understand when they do not.
The journalist who conducted the tests suggested that if AI is to be used as a reading aid, it is best to utilize at least two tools for comparison, and for important documents concerning personal interests, one should still read them carefully in person.
Overall, AI can currently serve as an auxiliary tool, such as assisting in quickly grasping new topics or interpreting specialized terminology, but its results should not be solely relied upon.
This article is collaboratively republished from: Digital Age
Further Reading: Is Chunghwa Telecom Untrustworthy? An Analysis of the Key Points Behind “Google Revoking Certificates”: Uncovering Three Major Management Deficiencies
Strengthening E-commerce Logistics: What is Third-party Logistics? A Comprehensive Look at Four Major Convenience Store Pickup Strategies and Performance
Editor: Li Xiantai
This draft was initially composed by AI, organized and edited by Li Xiantai
Source: Washington Post