Best Multimodal LLMs for Image Analysis (Jul 2024) ๐Ÿ“ธ

Comparing top multimodal LLMs like GPT-4o, Gemini 1.5 Flash, Copilot, and Claude 3 in image analysis performance. Jul 2024 update!

Best Multimodal LLMs for Image Analysis (Jul 2024) ๐Ÿ“ธ
GPT Odyssey
1.7K views โ€ข Jun 27, 2024
Best Multimodal LLMs for Image Analysis (Jul 2024) ๐Ÿ“ธ

About this video

๐ŸŽฅ In this video, we pit top multimodal LLMs against each other in an image analysis clash! Featuring GPT-4o, Gemini 1.5 Flash, Microsoft Copilot, Claude 3 & 3.5 Sonnet, plus trending Hugging Face image models like InternVL, OpenBMP, and Salesforce Blip. ๐Ÿค–๐Ÿ’ฅ

We have some interesting discoveries! Is there a new king on the block?
Who will reign supreme? Watch to find out!

Time stamp:
0:00 Intro and how we benchmark them
3:10 [Round 1] Parking Sign: gpt-4o
4:12 [Round 1] Parking Sign: Gemini 1.5 Flash
6:37 [Round 1] Parking Sign: Claude 3 Sonnet
8:12 [Round 1] Parking Sign: MS Copilot
12:30 [Round 1] Parking Sign: Bonus question
14:15 [Round 1] Parking Sign: InternVL
15:52 [Round 1] Parking Sign: MiniCPM-Llama3
18:42 [Round 1] Parking Sign: Salesforce Blip
20:55 [Round 2] Restaurant Menu: gpt-4o
22:12 [Round 2] Restaurant Menu: Gemini 1.5 Flash
24:55 [Round 2] Restaurant Menu: Claude 3 Sonnet
26:20 [Round 2] Restaurant Menu: MS Copilot
28:50 [Round 2] Restaurant Menu: InternVL
29:38 [Round 2] Restaurant Menu: MiniCPM-Llama3
30:27 [Round 2] Restaurant Menu: Salesforce Blip
31:08 [Round 3] A landmark photo: gpt-4o
31:36 [Round 3] A landmark photo: Gemini 1.5 Flash
32:41 [Round 3] A landmark photo: Claude 3 Sonnet
33:24 [Round 3] A landmark photo: InternVL
33:42 [Round 3] A landmark photo: MiniCPM-Llama3
33:58 [Round 3] A landmark photo: Salesforce Blip
34:08 [Round 4] Subway map: gpt-4o
38:02 [Round 4] Subway map: Gemini 1.5 Flash
39:50 [Round 4] Subway map: Claude 3 Sonnet
41:45 [Round 4] Subway map: MS Copilot
43:28 [Round 4] Subway map: InternVL
44:09 [Round 4] Subway map: MiniCPM-Llama3
44:44 [Round 4] Subway map: Salesforce Blip
45:50 [Round 5] Foreign language recognition: gpt-4o
46:28 [Round 5] Foreign language recognition: Gemini 1.5 Flash
46:48 [Round 5] Foreign language recognition: Claude 3 Sonnet
47:12 [Round 5] Foreign language recognition: MS Copilot
47:49 [Round 5] Foreign language recognition: InternVL
48:10 [Round 5] Foreign language recognition: MiniCPM-Llama3
48:25 [Round 5] Foreign language recognition: Salesforce Blip
49:04 [Round 6] Chessboard reading: gpt-4o
51:01 [Round 6] Chessboard reading: Gemini 1.5 Flash
51:45 [Round 6] Chessboard reading: Claude 3 Sonnet
52:35 [Round 6] Chessboard reading: MS Copilot
52:50 [Round 6] Chessboard reading: InternVL
53:05 [Round 6] Chessboard reading: MiniCPM-Llama3
53:44 [Round 6] Chessboard reading: Salesforce Blip
54:16 Battle Summary
56:55 Bonus content for Claude 3.5 Sonnet, all six rounds!!!

[Hugging Face Models] https://huggingface.co/models?pipeline_tag=visual-question-answering&sort=trending
[InternVL] https://github.com/OpenGVLab
[MiniCPM-Llama3] https://github.com/OpenBMB
[Salesforce Blip] https://huggingface.co/Salesforce/blip-image-captioning-base

#openai #gpt4o #claude3 #gemini #BestLLM #aiassistant #codeassistance

Video Information

Views

1.7K

Likes

10

Duration

01:03:21

Published

Jun 27, 2024

User Reviews

4.0
(1)
Rate:

Related Trending Topics

LIVE TRENDS

Related trending topics. Click any trend to explore more videos.

Trending Now