Deep Dives

GPT-4 is apparently getting dumber

July 21, 2023
We summarized this source into key points to remember. To know more about it, please click on the link above.

Receive a daily summary of what happened in tech, powered by ML and AI.

Thank you! We sent you a verification email.
Oops! Something went wrong while submitting the form.
Join 1,500+ thinkers, builders and investors.
The article discusses a study conducted on the performance of GPT-3.5 and GPT-4, the large language models (LLM) behind ChatGPT and ChatGPT Plus, respectively. The research reveals unexpected decreases in GPT-4's ability to solve mathematical problems, generate code, and answer sensitive questions over a span of a few months.

Research Details: The study was carried out by scientists from Stanford University and UC Berkeley.
  • They analyzed the efficiency of GPT-3.5 and GPT-4 over time, examining their capabilities in solving math problems, generating code, and responding to sensitive questions.
  • Tests were conducted in March and June, offering a comparative insight into the models' performances.

  • Surprising Findings: GPT-4, considered the most advanced LLM, displayed significant drops in performance across multiple categories.
  • In math problem-solving, its accuracy fell from 97.6% to just 2.4% over the span of a few months.
  • Its ability to generate executable code also decreased, dropping from a 52% success rate to just 10%.
  • When posed with sensitive questions, GPT-4’s response rate drastically fell, from a 21% response rate in May to just 5% in June.

  • Comparison with GPT-3.5: Interestingly, GPT-3.5 showed improvement over the same time period.
  • Its mathematical problem-solving accuracy improved, having initially provided wrong answers in March but correct ones in June.
  • In responding to sensitive questions, GPT-3.5’s response rate increased from 2% in May to 8% in June.

  • Implications and Recommendations: The study underlines the need for constant evaluation of AI models’ capabilities.
  • Companies and individuals relying on these models should keep assessing their performance as their abilities may not always improve over time.
  • Given the observed decrease in GPT-4's quality, the research raises questions about its training process and suggests considering alternatives until more clarity is provided.
  • Did you like this article? 🙌

    Receive a daily summary of the best tech news from 50+ media (The Verge, Tech Crunch...).
    Thank you! We sent you a verification email.
    Oops! Something went wrong while submitting the form.
    Join 1,500+ thinkers, builders and investors.
    You're in! Thanks for subscribing to Techpresso :)
    Oops! Something went wrong while submitting the form.
    Join 5,000+ thinkers, builders and investors.
    Also available on:

    You might also like