Recently published: Vectara’s Open Source “Hallucination Evaluation Model” & corresponding leaderboard:
- They publish a Leaderboard on GitHub which compares LLM Performance at Producing Hallucinations when Summarizing Short Documents
- 1000 short documents were fed into each of the LLMs mentioned on the leaderboard via their public APIs
- The LLMs were then asked to summarize each short document, using only the facts presented in the document
- The documents used were taken primarily from the CNN / Daily Mail Corpus
- The hallucination rate is measured based on the summarization task put to the LLMs
- As you can see, OpenAI’s GPT 4 has the lowest hallucination rate
Overall, the methodology is clearly described and repeatable, and the leaderboard makes for interesting reading!
© Dan Galavan 2023