7 Top 7 LLM Benchmarks & Chatbot Arena Explained 🤖
Discover the top 7 LLM benchmarks, including OpenLLM Leaderboard & Chatbot Arena, with clear explanations. Visit: https://leaderboard.bycloud.ai/

bycloud
25.9K views • Jan 9, 2024

About this video
Check out my website here! https://leaderboard.bycloud.ai/
In this video, I will be going through and explain the benchmarks for Chatbot Arena & Open LLM leaderboard. These are more general benchmarks for text-based LLMs, so HumanEval is not here. Let me know any other benchmarks you want me to explain in the future!
[Chatbot Arena] https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
[Open LLM Leaderboard] https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
[MMLU] https://huggingface.co/datasets/cais/mmlu
[ARC] https://huggingface.co/datasets/ai2_arc
[Winogrande] https://huggingface.co/datasets/winogrande
[TruthfulQA] https://huggingface.co/datasets/truthful_qa
[GSM8K] https://huggingface.co/datasets/gsm8k
[MT-Bench] https://huggingface.co/datasets/HuggingFaceH4/mt_bench_prompts
This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi
[Discord] https://discord.gg/NhJZGtH
[Twitter] https://twitter.com/bycloudai
[Patreon] https://www.patreon.com/bycloud
[Profile & Banner Art] https://twitter.com/pygm7
[Video Editor] Silas
0:00 Intro
0:57 MMLU
1:41 ARC
2:10 HELLASWAG
2:57 Winograde
3:27 TruthfulQA
3:52 GSM8K
4:26 MT-Bench
5:05 Outro
In this video, I will be going through and explain the benchmarks for Chatbot Arena & Open LLM leaderboard. These are more general benchmarks for text-based LLMs, so HumanEval is not here. Let me know any other benchmarks you want me to explain in the future!
[Chatbot Arena] https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
[Open LLM Leaderboard] https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
[MMLU] https://huggingface.co/datasets/cais/mmlu
[ARC] https://huggingface.co/datasets/ai2_arc
[Winogrande] https://huggingface.co/datasets/winogrande
[TruthfulQA] https://huggingface.co/datasets/truthful_qa
[GSM8K] https://huggingface.co/datasets/gsm8k
[MT-Bench] https://huggingface.co/datasets/HuggingFaceH4/mt_bench_prompts
This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi
[Discord] https://discord.gg/NhJZGtH
[Twitter] https://twitter.com/bycloudai
[Patreon] https://www.patreon.com/bycloud
[Profile & Banner Art] https://twitter.com/pygm7
[Video Editor] Silas
0:00 Intro
0:57 MMLU
1:41 ARC
2:10 HELLASWAG
2:57 Winograde
3:27 TruthfulQA
3:52 GSM8K
4:26 MT-Bench
5:05 Outro
Tags and Topics
Browse our collection to discover more content in these categories.
Video Information
Views
25.9K
Likes
931
Duration
5:50
Published
Jan 9, 2024
User Reviews
4.6
(5) Related Trending Topics
LIVE TRENDSRelated trending topics. Click any trend to explore more videos.
Trending Now