Which AI is the Best at Reading? The Champion is "Not ChatGPT": A Comparison of 5 Mainstream AIs, Only "One" Did Not Exhibit Hallucinations.

Key Point 1:

In AI reading tests, Claude secured the top position with a stable performance free of “hallucinations,” followed closely by ChatGPT. However, the overall AI scores were relatively low.

Key Point 2:

The understanding capabilities of various AI systems vary significantly across different fields such as literature, law, science, and politics, with differing performances.

Key Point 3:

Experts believe that AI currently cannot replace human reading, especially in handling important documents, and can only serve as an auxiliary tool.

Which AI is the Best Reader?

Fast forward to 2025, generative AI has introduced numerous features focused on data integration, such as Google’s Notebook LM and various Deep Research functionalities, all relying on the AI model’s “reading ability” and reasoning skills after inputting data.

Regarding the reading abilities of the current five mainstream AI models, the Washington Post test results indicate that Claude, developed by Anthropic, performed the best, scoring the highest overall and being the only AI that did not exhibit “hallucinations” (where AI fabricates information), while ChatGPT came in second.

In conclusion, regardless of the score ratings, the Washington Post testing results reveal that current AIs still have significant shortcomings in deep understanding and analysis, with an overall average score of only about 70%, equivalent to a D+ in academic grading, indicating substantial room for improvement in AI reading comprehension.

AI Strengths in Reading: Claude Excels in Law, ChatGPT in Literature

The Washington Post assessed five AIs: Claude, ChatGPT, Copilot, Meta AI, and Google’s Gemini. The testing scope included literature novels, legal contracts, medical research, and political speeches, with blind evaluations conducted by experts in each field. The results are as follows:

Literature Field: ChatGPT 7.8; Claude 7.3; Meta AI 4.3; Copilot 3.5; Gemini 2.3.

Legal Field: Claude 6.9; Gemini 6.1; Copilot 5.4; ChatGPT 5.3; Meta AI 2.6.

Health Science Field: Claude 7.7; ChatGPT 7.2; Copilot 7; Gemini 6.5; Meta AI 6.

Political Field: ChatGPT 7.2; Claude 6.2; Meta AI 5.2; Gemini 5; Copilot 3.7.

Overall scores are as follows:

Claude: 69.9
ChatGPT: 68.4
Gemini: 49.7
Copilot: 49
Meta AI: 45

In summary, Claude narrowly outperformed ChatGPT, while Gemini, Copilot, and Meta AI scored below 50. Notably, Claude was the only AI that did not generate any hallucinations.

The documents tested included the novel The Jackal’s Mistress in the literature category, medical papers on COVID-19 and Parkinson’s disease in health, a leasing agreement and construction contract in law, and Trump’s speech documents in politics.

The results demonstrate significant discrepancies in AI performance across various professional fields. For instance, ChatGPT performed better in literature and political categories but lagged in understanding legal documents, whereas Claude achieved the highest scores in law and health science.

However, even the best-performing Claude did not score top marks in literature, and Gemini’s performance in literary comprehension was criticized as “inaccurate, misleading, and hasty,” with a sense of attempting to gloss over its shortcomings.

It is worth noting that all four AIs, except for Claude, displayed varying degrees of information fabrication during the testing. This confirms that AI’s ability to read long texts remains limited, leading to frequent omissions of important information in generated summaries or an overemphasis on positive content while neglecting negative details.

Note 1: The original testing period was from April to May 2025, using AI versions: ChatGPT-4o, Gemini 2.0 Flash, Claude 3 Sonnet, Llama 4, Copilot for Microsoft 365.

Note 2: Reviewers scored each AI answer on a scale of 10, with scores in each academic field being the average of all ratings. The total score was equally weighted across the four academic fields and presented on a scale of 100.

Expert Summary: AI Cannot Replace Human Reading

Despite some AIs demonstrating impressive capabilities in specific analytical tasks, such as ChatGPT summarizing novels and Claude’s suggestions for revising legal documents or insights for medical papers, experts remain cautious about the current reading comprehension abilities of AI.

For example, corporate lawyer Sterling Miller, who participated in the review, pointed out that AI’s performance in handling legal documents is not stable enough to replace professional lawyers; novelist Chris Bohjalian noted that AI’s responses sometimes resemble “robots wearing human masks,” pretending to understand when they do not.

The journalist who conducted the tests suggested that if AI is to be used as a reading aid, it is best to utilize at least two tools for comparison, and for important documents concerning personal interests, one should still read them carefully in person.

Overall, AI can currently serve as an auxiliary tool, such as assisting in quickly grasping new topics or interpreting specialized terminology, but its results should not be solely relied upon.

This article is collaboratively republished from: Digital Age

Further Reading: Is Chunghwa Telecom Untrustworthy? An Analysis of the Key Points Behind “Google Revoking Certificates”: Uncovering Three Major Management Deficiencies
Strengthening E-commerce Logistics: What is Third-party Logistics? A Comprehensive Look at Four Major Convenience Store Pickup Strategies and Performance
Editor: Li Xiantai
This draft was initially composed by AI, organized and edited by Li Xiantai
Source: Washington Post

What's Hot

Are Your Cryptocurrency Payment Cards Still Functional? The Collective “Failure” of U Cards: Compliance, Costs, and Parasitic Challenges.

Is the US Dollar’s Hegemony Gaining a New Weapon? A Layman’s Guide to Understanding the GENIUS Act and Its Role in Solidifying Global Financial Dominance

AppWorks Demo Day #30 Unveils Four Web3 Teams, Including One from Taiwan

Which AI is the Best at Reading? The Champion is “Not ChatGPT”: A Comparison of 5 Mainstream AIs, Only “One” Did Not Exhibit Hallucinations.

Is the US Dollar’s Hegemony Gaining a New Weapon? A Layman’s Guide to Understanding the GENIUS Act and Its Role in Solidifying Global Financial Dominance

From Toy Company to Blockchain Giant: How Justin Sun Capitalized on the Trump Family’s Momentum from Meme Coins to Nasdaq

FIFA and MapleStory Choose It: Why Top-tier IPs Favor Avalanche?

AppWorks Demo Day #30 Unveils Four Web3 Teams, Including One from Taiwan

Earning Rewards by Staking Coins in Pools! What is Liquidity Mining?

Why is “brick-moving” suitable for beginners to profit from buying low and selling high? A simple guide to understanding arbitrage.

Sei Blockchain Mainnet goes live! Why is it the most suitable for transactions with a speed 10 times faster than Solana?

Are Your Cryptocurrency Payment Cards Still Functional? The Collective “Failure” of U Cards: Compliance, Costs, and Parasitic Challenges.

Is the US Dollar’s Hegemony Gaining a New Weapon? A Layman’s Guide to Understanding the GENIUS Act and Its Role in Solidifying Global Financial Dominance

AppWorks Demo Day #30 Unveils Four Web3 Teams, Including One from Taiwan

Top Insights

Are Your Cryptocurrency Payment Cards Still Functional? The Collective “Failure” of U Cards: Compliance, Costs, and Parasitic Challenges.

Is the US Dollar’s Hegemony Gaining a New Weapon? A Layman’s Guide to Understanding the GENIUS Act and Its Role in Solidifying Global Financial Dominance

AppWorks Demo Day #30 Unveils Four Web3 Teams, Including One from Taiwan

What's Hot

Which AI is the Best at Reading? The Champion is “Not ChatGPT”: A Comparison of 5 Mainstream AIs, Only “One” Did Not Exhibit Hallucinations.

Key Point 1:

Key Point 2:

Key Point 3:

Which AI is the Best Reader?

AI Strengths in Reading: Claude Excels in Law, ChatGPT in Literature

Expert Summary: AI Cannot Replace Human Reading

Related Posts