GPT-5.2 vs Opus 4.5: Top Coding Benchmark ⚙️
A year of code tested in hours. GPT-5.2 and Opus 4.5 excelled on my toughest benchmark yet, showcasing their impressive capabilities.

Matt Maher
24.3K views • Dec 13, 2025

About this video
A year's worth of code. Built in hours. GPT-5.2 vs Opus 4.5 on my hardest benchmark yet—and both crushed it.
I designed this benchmark differently than anything you've seen before. No cherry-picked problems. No simple coding challenges. Instead: a massive, production-grade PRD pulled from an app I actually built—complete with multiple detail pages, AI-powered features like "Scoop" and "Alchemy," cast and crew data, season/episode hierarchies, and streaming service integrations.
The results? Yesterday's state-of-the-art (GPT-5.1 Codex Max) couldn't keep up. The new GPT-5.2 Medium delivered surprisingly well for a "recommended" tier. But the real showdown came down to GPT-5.2 Extra High vs Claude Opus 4.5—and both models absolutely crushed it after just one refinement pass.
Here's what shocked me most: Opus 4.5 tells you what it's doing. It communicates. It gives you a to-do list, explains its reasoning, and confirms what it understood. GPT-5.2? Just starts coding. No feedback. No confirmation. That distinction matters more than any benchmark score.
In this video I cover:
- The complete benchmark setup and PRD complexity
- Side-by-side builds across all four models
- First-pass results and feature completion analysis
- The "Delta Document" technique that unlocked 90-95% completion
- Why communication style might be the most important model difference
- The Alchemy feature build that blew my mind
Models tested:
- GPT-5.1 Codex Max Extra High (yesterday's OpenAI SOTA)
- GPT-5.2 Medium (recommended tier)
- GPT-5.2 Extra High (new OpenAI flagship)
- Claude Opus 4.5 (Anthropic's latest)
This is what happens when you stop running toy benchmarks and start testing what actually matters.
https://openai.com
https://anthropic.com
https://claude.ai
#GPT52 #ClaudeOpus #AIBenchmark #AICoding #OpenAI #Anthropic
00:00 - Intro
00:53 - The PRD
02:12 - The Preparation
04:06 - Initial Results
10:19 - Side by Side
12:29 - Going Farther
19:39 - Pushing ALL THE WAY
23:08 - Conclusion
I designed this benchmark differently than anything you've seen before. No cherry-picked problems. No simple coding challenges. Instead: a massive, production-grade PRD pulled from an app I actually built—complete with multiple detail pages, AI-powered features like "Scoop" and "Alchemy," cast and crew data, season/episode hierarchies, and streaming service integrations.
The results? Yesterday's state-of-the-art (GPT-5.1 Codex Max) couldn't keep up. The new GPT-5.2 Medium delivered surprisingly well for a "recommended" tier. But the real showdown came down to GPT-5.2 Extra High vs Claude Opus 4.5—and both models absolutely crushed it after just one refinement pass.
Here's what shocked me most: Opus 4.5 tells you what it's doing. It communicates. It gives you a to-do list, explains its reasoning, and confirms what it understood. GPT-5.2? Just starts coding. No feedback. No confirmation. That distinction matters more than any benchmark score.
In this video I cover:
- The complete benchmark setup and PRD complexity
- Side-by-side builds across all four models
- First-pass results and feature completion analysis
- The "Delta Document" technique that unlocked 90-95% completion
- Why communication style might be the most important model difference
- The Alchemy feature build that blew my mind
Models tested:
- GPT-5.1 Codex Max Extra High (yesterday's OpenAI SOTA)
- GPT-5.2 Medium (recommended tier)
- GPT-5.2 Extra High (new OpenAI flagship)
- Claude Opus 4.5 (Anthropic's latest)
This is what happens when you stop running toy benchmarks and start testing what actually matters.
https://openai.com
https://anthropic.com
https://claude.ai
#GPT52 #ClaudeOpus #AIBenchmark #AICoding #OpenAI #Anthropic
00:00 - Intro
00:53 - The PRD
02:12 - The Preparation
04:06 - Initial Results
10:19 - Side by Side
12:29 - Going Farther
19:39 - Pushing ALL THE WAY
23:08 - Conclusion
Video Information
Views
24.3K
Likes
857
Duration
28:16
Published
Dec 13, 2025
User Reviews
4.6
(4) Related Trending Topics
LIVE TRENDSRelated trending topics. Click any trend to explore more videos.
No specific trending topics match this video yet.
Explore All Trends