Easily Cheat on LLM Benchmarks & Win Prizes 🎁
Learn how cheating LLM benchmarks is simpler than expected. Sign up for NVIDIA GTC2025 and join the RTX4080 SUPER Giveaway!

bycloud
27.6K views • Mar 10, 2025

About this video
Sign up for NVIDIA GTC2025 here!
https://nvda.ws/48s4tmc
Join The RTX4080 SUPER Giveaway (enter between March 17-21st)
https://forms.gle/TbGgoD5obn1zRz5r7
In this video, you will learn about the fascinating ways of how AI companies can rig LLM benchmarks, for educational purposes of course... Plus the reason why people don’t trust “chatbot arena” as much as more.
My Newletter
https://mail.bycloud.ai/
My Patreon
https://www.patreon.com/c/bycloud
I was really vague about the sources in this video to fit it in the narrative, but most facts presented are backed by the following papers
Changing Answer Order Can Decrease MMLU Accuracy
[Paper] arxiv.org/abs/2406.19470
Catch me if you can! How to beat GPT-4 with a 13B model
[Blog] https://lmsys.org/blog/2023-11-14-llm-decontaminator/
Idiosyncrasies in Large Language Models
[Paper] https://arxiv.org/abs/2502.12150
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
[Paper] https://arxiv.org/abs/2501.17858
This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, Ben Shaener, Chris LeDoux, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Marcelo Ferreira, Owen Ingraham, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Penumbraa, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth, Thipok Tham, Clayton Ford, Theo, Handenon, Diego Silva, mayssam, Kadhai Pesalam, Tim Schulz, jiye, Anushka, Henrik Sundt, Julian Aßmann, Thomas Lin, Sid_Cypher, Mark Buckler, Kevin Tai, NO U, Gonzalo Fidalgo, Igor Alvarez, Alon Pluda, Clément Veyssière, Sander Zwaenepoel, etrotta, Binnie Yiu, Matej Macak, c zhou, Berhane-Meskel, sai sandeep mandava, Leo, Asad Dhamani, Charlie C, tantan assawade, Ângelo Fonseca, Stefan Lorenz, Paperboy, mika, Leo, Utsav Soi
[Discord] https://discord.gg/NhJZGtH
[Twitter] https://twitter.com/bycloudai
[Patreon] https://www.patreon.com/bycloud
[Business Inquiries] bycloudai@gmail.com
[Music] Massobeats - Glimmer
[Music] Massobeats - Honey jam
[Music] Massobeats - Lush
[Profile & Banner Art] https://twitter.com/pygm7
[Video Editor] @Booga04
[Bitcoin (BTC)] 3JFMJQVGXNA2HJE5V9qCwLiqy6wHY9Vhdx
[Ethereum (ETH)] 0x3d784F55E0bE5f35c1566B2E014598C0f354f190
[Litecoin (LTC)] MGHnqALjyU2W6NuJSSW9fTWV4dcHfwHZd7
[Bitcoin Cash (BCH)] 1LkyGfzHxnSfqMF8tN7ZGDwUTyBB6vcii9
[Solana (SOL)] 6XyMCEdVhtxJQRjMKgUJaySL8cGoBPzzA2NPDMPfVkKN
[Ko-fi] https://ko-fi.com/bycloudai
https://nvda.ws/48s4tmc
Join The RTX4080 SUPER Giveaway (enter between March 17-21st)
https://forms.gle/TbGgoD5obn1zRz5r7
In this video, you will learn about the fascinating ways of how AI companies can rig LLM benchmarks, for educational purposes of course... Plus the reason why people don’t trust “chatbot arena” as much as more.
My Newletter
https://mail.bycloud.ai/
My Patreon
https://www.patreon.com/c/bycloud
I was really vague about the sources in this video to fit it in the narrative, but most facts presented are backed by the following papers
Changing Answer Order Can Decrease MMLU Accuracy
[Paper] arxiv.org/abs/2406.19470
Catch me if you can! How to beat GPT-4 with a 13B model
[Blog] https://lmsys.org/blog/2023-11-14-llm-decontaminator/
Idiosyncrasies in Large Language Models
[Paper] https://arxiv.org/abs/2502.12150
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
[Paper] https://arxiv.org/abs/2501.17858
This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, Ben Shaener, Chris LeDoux, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Marcelo Ferreira, Owen Ingraham, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Penumbraa, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth, Thipok Tham, Clayton Ford, Theo, Handenon, Diego Silva, mayssam, Kadhai Pesalam, Tim Schulz, jiye, Anushka, Henrik Sundt, Julian Aßmann, Thomas Lin, Sid_Cypher, Mark Buckler, Kevin Tai, NO U, Gonzalo Fidalgo, Igor Alvarez, Alon Pluda, Clément Veyssière, Sander Zwaenepoel, etrotta, Binnie Yiu, Matej Macak, c zhou, Berhane-Meskel, sai sandeep mandava, Leo, Asad Dhamani, Charlie C, tantan assawade, Ângelo Fonseca, Stefan Lorenz, Paperboy, mika, Leo, Utsav Soi
[Discord] https://discord.gg/NhJZGtH
[Twitter] https://twitter.com/bycloudai
[Patreon] https://www.patreon.com/bycloud
[Business Inquiries] bycloudai@gmail.com
[Music] Massobeats - Glimmer
[Music] Massobeats - Honey jam
[Music] Massobeats - Lush
[Profile & Banner Art] https://twitter.com/pygm7
[Video Editor] @Booga04
[Bitcoin (BTC)] 3JFMJQVGXNA2HJE5V9qCwLiqy6wHY9Vhdx
[Ethereum (ETH)] 0x3d784F55E0bE5f35c1566B2E014598C0f354f190
[Litecoin (LTC)] MGHnqALjyU2W6NuJSSW9fTWV4dcHfwHZd7
[Bitcoin Cash (BCH)] 1LkyGfzHxnSfqMF8tN7ZGDwUTyBB6vcii9
[Solana (SOL)] 6XyMCEdVhtxJQRjMKgUJaySL8cGoBPzzA2NPDMPfVkKN
[Ko-fi] https://ko-fi.com/bycloudai
Tags and Topics
Browse our collection to discover more content in these categories.
Video Information
Views
27.6K
Likes
1.6K
Duration
9:51
Published
Mar 10, 2025
User Reviews
4.6
(5) Related Trending Topics
LIVE TRENDSRelated trending topics. Click any trend to explore more videos.
Trending Now