The Ultimate AI Showdown: ChatGPT vs Claude vs Gemini

2025-11-28 15:208 min read

In this video, the speaker assesses various popular AI language models to evaluate their truthfulness and reliability in academic research. The analysis focuses on two primary areas: the existence of accurate references and the correctness of citation claims. The results reveal that while models like ChatGPT provide valid references over 60% of the time, Gemini performs significantly worse, only achieving a 20% success rate. The video also emphasizes that simply paying for a model doesn't ensure better performance. Instead, it suggests that specialized tools, like Elicit and Consensus, which are designed specifically for academic purposes, deliver superior results for reliable referencing. Overall, viewers are encouraged to verify citations manually and explore alternative resources instead of relying solely on AI models.

Key Information

  • The discussion centers on the reliability of various AI models in providing accurate references for academic research.
  • Two key types of inaccuracies are identified: first-order hallucinations (false references) and second-order hallucinations (inaccurate claims about references).
  • ChatGPT versus other models like Claude and Gemini were compared in terms of their ability to generate real and accurate references.
  • ChatGPT performs best with over 60% accuracy, while Claude falls behind with about 56%, and Gemini performs poorly with only around 20%.
  • It’s emphasized that paying for models does not necessarily improve their accuracy or reliability.
  • Alternative tools like Elicit and Consensus are recommended for academic research, as they utilize verified references and provide accurate information.

Timeline Analysis

Content Keywords

AI Models

The video discusses the efficacy of various AI models in providing accurate references in academic research, categorizing them into first and second order hallucinations to differentiate between models that provide accurate citations and those that do not.

ChatGPT

ChatGPT showed a correct response rate of over 60% for providing accurate references, making it a leading choice among AI models for academic usage when utilizing web search and deep research features.

Claude

Claude's performance was slightly less effective, with a success rate of around 56%, demonstrating its ability to provide some valid references but with limitations.

Gemini

Gemini performed poorly in this test, achieving only a 20% correctness rate in providing references that actually existed, highlighting significant issues in its reliability for academic purposes.

Citation Accuracy

The video emphasizes the importance of checking citations against original papers to confirm their legitimacy, as many AI models may misrepresent references in their outputs.

References for Academia

The speaker recommends specific tools such as Elicit and Consensus that are designed for academic use, promising real references and accurate information, unlike some AI models.

Elicit

Elicit is highlighted as a reliable tool for academics, as it uses verified papers and performs checks in the background to ensure that users receive accurate citations.

Consensus

Consensus is introduced as a fast and effective tool for determining answers in research fields, capable of providing quick yes or no responses based on data from real references.

Research Tools

The video stresses the need for researchers to use specialized tools instead of relying solely on AI language models for gathering accurate information and references.

More video recommendations

Share to: