Recently published: Vectara’s Open Source “Hallucination Evaluation Model” & corresponding leaderboard:

  • They publish a Leaderboard on GitHub which compares LLM Performance at Producing Hallucinations when Summarizing Short Documents
  • 1000 short documents were fed into each of the LLMs mentioned on the leaderboard via their public APIs
  • The LLMs were then asked to summarize each short document, using only the facts presented in the document
  • The documents used were taken primarily from the CNN / Daily Mail Corpus
  • The hallucination rate is measured based on the summarization task put to the LLMs
  • As you can see, OpenAI’s GPT 4 has the lowest hallucination rate

Overall, the methodology is clearly described and repeatable, and the leaderboard makes for interesting reading!

© Dan Galavan 2023