GPT-5.2 vs Opus 4.5: Top Coding Benchmark ⚙️

A year of code tested in hours. GPT-5.2 and Opus 4.5 excelled on my toughest benchmark yet, showcasing their impressive capabilities.

GPT-5.2 vs Opus 4.5: Top Coding Benchmark ⚙️
Matt Maher
24.3K views • Dec 13, 2025
GPT-5.2 vs Opus 4.5: Top Coding Benchmark ⚙️

About this video

A year's worth of code. Built in hours. GPT-5.2 vs Opus 4.5 on my hardest benchmark yet—and both crushed it.

I designed this benchmark differently than anything you've seen before. No cherry-picked problems. No simple coding challenges. Instead: a massive, production-grade PRD pulled from an app I actually built—complete with multiple detail pages, AI-powered features like "Scoop" and "Alchemy," cast and crew data, season/episode hierarchies, and streaming service integrations.

The results? Yesterday's state-of-the-art (GPT-5.1 Codex Max) couldn't keep up. The new GPT-5.2 Medium delivered surprisingly well for a "recommended" tier. But the real showdown came down to GPT-5.2 Extra High vs Claude Opus 4.5—and both models absolutely crushed it after just one refinement pass.

Here's what shocked me most: Opus 4.5 tells you what it's doing. It communicates. It gives you a to-do list, explains its reasoning, and confirms what it understood. GPT-5.2? Just starts coding. No feedback. No confirmation. That distinction matters more than any benchmark score.

In this video I cover:
- The complete benchmark setup and PRD complexity
- Side-by-side builds across all four models
- First-pass results and feature completion analysis
- The "Delta Document" technique that unlocked 90-95% completion
- Why communication style might be the most important model difference
- The Alchemy feature build that blew my mind

Models tested:
- GPT-5.1 Codex Max Extra High (yesterday's OpenAI SOTA)
- GPT-5.2 Medium (recommended tier)
- GPT-5.2 Extra High (new OpenAI flagship)
- Claude Opus 4.5 (Anthropic's latest)

This is what happens when you stop running toy benchmarks and start testing what actually matters.

https://openai.com
https://anthropic.com
https://claude.ai

#GPT52 #ClaudeOpus #AIBenchmark #AICoding #OpenAI #Anthropic
00:00 - Intro
00:53 - The PRD
02:12 - The Preparation
04:06 - Initial Results
10:19 - Side by Side
12:29 - Going Farther
19:39 - Pushing ALL THE WAY
23:08 - Conclusion

Video Information

Views

24.3K

Likes

857

Duration

28:16

Published

Dec 13, 2025

User Reviews

4.6
(4)
Rate:

Related Trending Topics

LIVE TRENDS

Related trending topics. Click any trend to explore more videos.