Chatbot Arena: Crowdsourced Human Feedback Platform ๐ค
Chatbot Arena is a crowdsourced platform for evaluating large language models with human feedback, developed by Wei-Lin Chiang at UC Berkeley.

The Linux Foundation
560 views โข Dec 18, 2023

About this video
Chatbot Arena: An Open Crowdsourced Platform for Human Feedback on LLMs - Wei-Lin Chiang, UC Berkeley / LMSYS
Chatbot Arena is a benchmark platform for evaluating large language models (LLMs) with human feedback. It allows users to interact with two anonymous models side-by-side, and vote for the one they believe provides better responses. The platform uses the Elo rating system, a method commonly used in chess and other competitive games, to rank the performance of the chatbots. This system is designed to provide a fair and accurate comparison of the models' capabilities under real-world use cases.
Over the past few months, Chatbot Arena has served millions of user requests and collected 100K+ votes. The datasets of user conversations and human preferences are publicly available. We conducted a deeper study to explore a few use cases including developing content moderation models, building a safety benchmark, training instruction-following models, and creating challenging benchmark questions. Learn more details in our paper: https://arxiv.org/abs/2309.11998
Chatbot Arena is a benchmark platform for evaluating large language models (LLMs) with human feedback. It allows users to interact with two anonymous models side-by-side, and vote for the one they believe provides better responses. The platform uses the Elo rating system, a method commonly used in chess and other competitive games, to rank the performance of the chatbots. This system is designed to provide a fair and accurate comparison of the models' capabilities under real-world use cases.
Over the past few months, Chatbot Arena has served millions of user requests and collected 100K+ votes. The datasets of user conversations and human preferences are publicly available. We conducted a deeper study to explore a few use cases including developing content moderation models, building a safety benchmark, training instruction-following models, and creating challenging benchmark questions. Learn more details in our paper: https://arxiv.org/abs/2309.11998
Video Information
Views
560
Likes
13
Duration
26:56
Published
Dec 18, 2023
Related Trending Topics
LIVE TRENDSRelated trending topics. Click any trend to explore more videos.
Trending Now