Best Multimodal LLMs for Image Analysis (Jul 2024) πΈ
Comparing top multimodal LLMs like GPT-4o, Gemini 1.5 Flash, Copilot, and Claude 3 in image analysis performance. Jul 2024 update!

GPT Odyssey
1.7K views β’ Jun 27, 2024

About this video
π₯ In this video, we pit top multimodal LLMs against each other in an image analysis clash! Featuring GPT-4o, Gemini 1.5 Flash, Microsoft Copilot, Claude 3 & 3.5 Sonnet, plus trending Hugging Face image models like InternVL, OpenBMP, and Salesforce Blip. π€π₯
We have some interesting discoveries! Is there a new king on the block?
Who will reign supreme? Watch to find out!
Time stamp:
0:00 Intro and how we benchmark them
3:10 [Round 1] Parking Sign: gpt-4o
4:12 [Round 1] Parking Sign: Gemini 1.5 Flash
6:37 [Round 1] Parking Sign: Claude 3 Sonnet
8:12 [Round 1] Parking Sign: MS Copilot
12:30 [Round 1] Parking Sign: Bonus question
14:15 [Round 1] Parking Sign: InternVL
15:52 [Round 1] Parking Sign: MiniCPM-Llama3
18:42 [Round 1] Parking Sign: Salesforce Blip
20:55 [Round 2] Restaurant Menu: gpt-4o
22:12 [Round 2] Restaurant Menu: Gemini 1.5 Flash
24:55 [Round 2] Restaurant Menu: Claude 3 Sonnet
26:20 [Round 2] Restaurant Menu: MS Copilot
28:50 [Round 2] Restaurant Menu: InternVL
29:38 [Round 2] Restaurant Menu: MiniCPM-Llama3
30:27 [Round 2] Restaurant Menu: Salesforce Blip
31:08 [Round 3] A landmark photo: gpt-4o
31:36 [Round 3] A landmark photo: Gemini 1.5 Flash
32:41 [Round 3] A landmark photo: Claude 3 Sonnet
33:24 [Round 3] A landmark photo: InternVL
33:42 [Round 3] A landmark photo: MiniCPM-Llama3
33:58 [Round 3] A landmark photo: Salesforce Blip
34:08 [Round 4] Subway map: gpt-4o
38:02 [Round 4] Subway map: Gemini 1.5 Flash
39:50 [Round 4] Subway map: Claude 3 Sonnet
41:45 [Round 4] Subway map: MS Copilot
43:28 [Round 4] Subway map: InternVL
44:09 [Round 4] Subway map: MiniCPM-Llama3
44:44 [Round 4] Subway map: Salesforce Blip
45:50 [Round 5] Foreign language recognition: gpt-4o
46:28 [Round 5] Foreign language recognition: Gemini 1.5 Flash
46:48 [Round 5] Foreign language recognition: Claude 3 Sonnet
47:12 [Round 5] Foreign language recognition: MS Copilot
47:49 [Round 5] Foreign language recognition: InternVL
48:10 [Round 5] Foreign language recognition: MiniCPM-Llama3
48:25 [Round 5] Foreign language recognition: Salesforce Blip
49:04 [Round 6] Chessboard reading: gpt-4o
51:01 [Round 6] Chessboard reading: Gemini 1.5 Flash
51:45 [Round 6] Chessboard reading: Claude 3 Sonnet
52:35 [Round 6] Chessboard reading: MS Copilot
52:50 [Round 6] Chessboard reading: InternVL
53:05 [Round 6] Chessboard reading: MiniCPM-Llama3
53:44 [Round 6] Chessboard reading: Salesforce Blip
54:16 Battle Summary
56:55 Bonus content for Claude 3.5 Sonnet, all six rounds!!!
[Hugging Face Models] https://huggingface.co/models?pipeline_tag=visual-question-answering&sort=trending
[InternVL] https://github.com/OpenGVLab
[MiniCPM-Llama3] https://github.com/OpenBMB
[Salesforce Blip] https://huggingface.co/Salesforce/blip-image-captioning-base
#openai #gpt4o #claude3 #gemini #BestLLM #aiassistant #codeassistance
We have some interesting discoveries! Is there a new king on the block?
Who will reign supreme? Watch to find out!
Time stamp:
0:00 Intro and how we benchmark them
3:10 [Round 1] Parking Sign: gpt-4o
4:12 [Round 1] Parking Sign: Gemini 1.5 Flash
6:37 [Round 1] Parking Sign: Claude 3 Sonnet
8:12 [Round 1] Parking Sign: MS Copilot
12:30 [Round 1] Parking Sign: Bonus question
14:15 [Round 1] Parking Sign: InternVL
15:52 [Round 1] Parking Sign: MiniCPM-Llama3
18:42 [Round 1] Parking Sign: Salesforce Blip
20:55 [Round 2] Restaurant Menu: gpt-4o
22:12 [Round 2] Restaurant Menu: Gemini 1.5 Flash
24:55 [Round 2] Restaurant Menu: Claude 3 Sonnet
26:20 [Round 2] Restaurant Menu: MS Copilot
28:50 [Round 2] Restaurant Menu: InternVL
29:38 [Round 2] Restaurant Menu: MiniCPM-Llama3
30:27 [Round 2] Restaurant Menu: Salesforce Blip
31:08 [Round 3] A landmark photo: gpt-4o
31:36 [Round 3] A landmark photo: Gemini 1.5 Flash
32:41 [Round 3] A landmark photo: Claude 3 Sonnet
33:24 [Round 3] A landmark photo: InternVL
33:42 [Round 3] A landmark photo: MiniCPM-Llama3
33:58 [Round 3] A landmark photo: Salesforce Blip
34:08 [Round 4] Subway map: gpt-4o
38:02 [Round 4] Subway map: Gemini 1.5 Flash
39:50 [Round 4] Subway map: Claude 3 Sonnet
41:45 [Round 4] Subway map: MS Copilot
43:28 [Round 4] Subway map: InternVL
44:09 [Round 4] Subway map: MiniCPM-Llama3
44:44 [Round 4] Subway map: Salesforce Blip
45:50 [Round 5] Foreign language recognition: gpt-4o
46:28 [Round 5] Foreign language recognition: Gemini 1.5 Flash
46:48 [Round 5] Foreign language recognition: Claude 3 Sonnet
47:12 [Round 5] Foreign language recognition: MS Copilot
47:49 [Round 5] Foreign language recognition: InternVL
48:10 [Round 5] Foreign language recognition: MiniCPM-Llama3
48:25 [Round 5] Foreign language recognition: Salesforce Blip
49:04 [Round 6] Chessboard reading: gpt-4o
51:01 [Round 6] Chessboard reading: Gemini 1.5 Flash
51:45 [Round 6] Chessboard reading: Claude 3 Sonnet
52:35 [Round 6] Chessboard reading: MS Copilot
52:50 [Round 6] Chessboard reading: InternVL
53:05 [Round 6] Chessboard reading: MiniCPM-Llama3
53:44 [Round 6] Chessboard reading: Salesforce Blip
54:16 Battle Summary
56:55 Bonus content for Claude 3.5 Sonnet, all six rounds!!!
[Hugging Face Models] https://huggingface.co/models?pipeline_tag=visual-question-answering&sort=trending
[InternVL] https://github.com/OpenGVLab
[MiniCPM-Llama3] https://github.com/OpenBMB
[Salesforce Blip] https://huggingface.co/Salesforce/blip-image-captioning-base
#openai #gpt4o #claude3 #gemini #BestLLM #aiassistant #codeassistance
Video Information
Views
1.7K
Likes
10
Duration
01:03:21
Published
Jun 27, 2024
User Reviews
4.0
(1) Related Trending Topics
LIVE TRENDSRelated trending topics. Click any trend to explore more videos.
Trending Now