Best Multimodal LLMs for Image Analysis (Jul 2024) ๐ธ
Comparing top multimodal LLMs like GPT-4o, Gemini 1.5 Flash, Copilot, and Claude 3 in image analysis performance. Jul 2024 update!

GPT Odyssey
1.7K views โข Jun 27, 2024

About this video
๐ฅ In this video, we pit top multimodal LLMs against each other in an image analysis clash! Featuring GPT-4o, Gemini 1.5 Flash, Microsoft Copilot, Claude 3 & 3.5 Sonnet, plus trending Hugging Face image models like InternVL, OpenBMP, and Salesforce Blip. ๐ค๐ฅ
We have some interesting discoveries! Is there a new king on the block?
Who will reign supreme? Watch to find out!
Time stamp:
0:00 Intro and how we benchmark them
3:10 [Round 1] Parking Sign: gpt-4o
4:12 [Round 1] Parking Sign: Gemini 1.5 Flash
6:37 [Round 1] Parking Sign: Claude 3 Sonnet
8:12 [Round 1] Parking Sign: MS Copilot
12:30 [Round 1] Parking Sign: Bonus question
14:15 [Round 1] Parking Sign: InternVL
15:52 [Round 1] Parking Sign: MiniCPM-Llama3
18:42 [Round 1] Parking Sign: Salesforce Blip
20:55 [Round 2] Restaurant Menu: gpt-4o
22:12 [Round 2] Restaurant Menu: Gemini 1.5 Flash
24:55 [Round 2] Restaurant Menu: Claude 3 Sonnet
26:20 [Round 2] Restaurant Menu: MS Copilot
28:50 [Round 2] Restaurant Menu: InternVL
29:38 [Round 2] Restaurant Menu: MiniCPM-Llama3
30:27 [Round 2] Restaurant Menu: Salesforce Blip
31:08 [Round 3] A landmark photo: gpt-4o
31:36 [Round 3] A landmark photo: Gemini 1.5 Flash
32:41 [Round 3] A landmark photo: Claude 3 Sonnet
33:24 [Round 3] A landmark photo: InternVL
33:42 [Round 3] A landmark photo: MiniCPM-Llama3
33:58 [Round 3] A landmark photo: Salesforce Blip
34:08 [Round 4] Subway map: gpt-4o
38:02 [Round 4] Subway map: Gemini 1.5 Flash
39:50 [Round 4] Subway map: Claude 3 Sonnet
41:45 [Round 4] Subway map: MS Copilot
43:28 [Round 4] Subway map: InternVL
44:09 [Round 4] Subway map: MiniCPM-Llama3
44:44 [Round 4] Subway map: Salesforce Blip
45:50 [Round 5] Foreign language recognition: gpt-4o
46:28 [Round 5] Foreign language recognition: Gemini 1.5 Flash
46:48 [Round 5] Foreign language recognition: Claude 3 Sonnet
47:12 [Round 5] Foreign language recognition: MS Copilot
47:49 [Round 5] Foreign language recognition: InternVL
48:10 [Round 5] Foreign language recognition: MiniCPM-Llama3
48:25 [Round 5] Foreign language recognition: Salesforce Blip
49:04 [Round 6] Chessboard reading: gpt-4o
51:01 [Round 6] Chessboard reading: Gemini 1.5 Flash
51:45 [Round 6] Chessboard reading: Claude 3 Sonnet
52:35 [Round 6] Chessboard reading: MS Copilot
52:50 [Round 6] Chessboard reading: InternVL
53:05 [Round 6] Chessboard reading: MiniCPM-Llama3
53:44 [Round 6] Chessboard reading: Salesforce Blip
54:16 Battle Summary
56:55 Bonus content for Claude 3.5 Sonnet, all six rounds!!!
[Hugging Face Models] https://huggingface.co/models?pipeline_tag=visual-question-answering&sort=trending
[InternVL] https://github.com/OpenGVLab
[MiniCPM-Llama3] https://github.com/OpenBMB
[Salesforce Blip] https://huggingface.co/Salesforce/blip-image-captioning-base
#openai #gpt4o #claude3 #gemini #BestLLM #aiassistant #codeassistance
We have some interesting discoveries! Is there a new king on the block?
Who will reign supreme? Watch to find out!
Time stamp:
0:00 Intro and how we benchmark them
3:10 [Round 1] Parking Sign: gpt-4o
4:12 [Round 1] Parking Sign: Gemini 1.5 Flash
6:37 [Round 1] Parking Sign: Claude 3 Sonnet
8:12 [Round 1] Parking Sign: MS Copilot
12:30 [Round 1] Parking Sign: Bonus question
14:15 [Round 1] Parking Sign: InternVL
15:52 [Round 1] Parking Sign: MiniCPM-Llama3
18:42 [Round 1] Parking Sign: Salesforce Blip
20:55 [Round 2] Restaurant Menu: gpt-4o
22:12 [Round 2] Restaurant Menu: Gemini 1.5 Flash
24:55 [Round 2] Restaurant Menu: Claude 3 Sonnet
26:20 [Round 2] Restaurant Menu: MS Copilot
28:50 [Round 2] Restaurant Menu: InternVL
29:38 [Round 2] Restaurant Menu: MiniCPM-Llama3
30:27 [Round 2] Restaurant Menu: Salesforce Blip
31:08 [Round 3] A landmark photo: gpt-4o
31:36 [Round 3] A landmark photo: Gemini 1.5 Flash
32:41 [Round 3] A landmark photo: Claude 3 Sonnet
33:24 [Round 3] A landmark photo: InternVL
33:42 [Round 3] A landmark photo: MiniCPM-Llama3
33:58 [Round 3] A landmark photo: Salesforce Blip
34:08 [Round 4] Subway map: gpt-4o
38:02 [Round 4] Subway map: Gemini 1.5 Flash
39:50 [Round 4] Subway map: Claude 3 Sonnet
41:45 [Round 4] Subway map: MS Copilot
43:28 [Round 4] Subway map: InternVL
44:09 [Round 4] Subway map: MiniCPM-Llama3
44:44 [Round 4] Subway map: Salesforce Blip
45:50 [Round 5] Foreign language recognition: gpt-4o
46:28 [Round 5] Foreign language recognition: Gemini 1.5 Flash
46:48 [Round 5] Foreign language recognition: Claude 3 Sonnet
47:12 [Round 5] Foreign language recognition: MS Copilot
47:49 [Round 5] Foreign language recognition: InternVL
48:10 [Round 5] Foreign language recognition: MiniCPM-Llama3
48:25 [Round 5] Foreign language recognition: Salesforce Blip
49:04 [Round 6] Chessboard reading: gpt-4o
51:01 [Round 6] Chessboard reading: Gemini 1.5 Flash
51:45 [Round 6] Chessboard reading: Claude 3 Sonnet
52:35 [Round 6] Chessboard reading: MS Copilot
52:50 [Round 6] Chessboard reading: InternVL
53:05 [Round 6] Chessboard reading: MiniCPM-Llama3
53:44 [Round 6] Chessboard reading: Salesforce Blip
54:16 Battle Summary
56:55 Bonus content for Claude 3.5 Sonnet, all six rounds!!!
[Hugging Face Models] https://huggingface.co/models?pipeline_tag=visual-question-answering&sort=trending
[InternVL] https://github.com/OpenGVLab
[MiniCPM-Llama3] https://github.com/OpenBMB
[Salesforce Blip] https://huggingface.co/Salesforce/blip-image-captioning-base
#openai #gpt4o #claude3 #gemini #BestLLM #aiassistant #codeassistance
Video Information
Views
1.7K
Likes
10
Duration
01:03:21
Published
Jun 27, 2024
User Reviews
4.0
(1) Related Trending Topics
LIVE TRENDSRelated trending topics. Click any trend to explore more videos.
Trending Now