Researchers Caught Their AI Model Trying to Escape

If this resonated with you, here’s how you can help today: https://campaign.controlai.com/take-action Sources: Apollo Research - "Frontier Models are Capabl...

Species | Documenting AGI101.0K views18:25

🔥 Related Trending Topics

LIVE TRENDS

This video may be related to current global trending topics. Click any trend to explore more videos about what's hot right now!

THIS VIDEO IS TRENDING!

This video is currently trending in Thailand under the topic 'สภาพอากาศ'.

About this video

If this resonated with you, here’s how you can help today: https://campaign.controlai.com/take-action Sources: Apollo Research - "Frontier Models are Capable of In-context Scheming" https://arxiv.org/pdf/2412.04984 - Nobel laureate Geoffrey Hinton says there is evidence that AIs can be deliberately and intentionally deceptive https://www.youtube.com/watch?v=b_DUft-BdIE - Anthropic - “Alignment Faking in Large Language Models” https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf - Exclusive: New Research Shows AI Strategically Lying | TIME https://time.com/7202784/ai-research-strategic-lying/ - OpenAI's o1 model sure tries to deceive humans a lot | TechCrunch https://techcrunch.com/2024/12/05/openais-o1-model-sure-tries-to-deceive-humans-a-lot/ - OpenAI’s new model is better at reasoning and, occasionally, deceiving | The Verge https://www.theverge.com/2024/9/17/24243884/openai-o1-model-research-safety-alignment - OpenAI's o1 and other frontier AI models engage in scheming | Axios https://www.axios.com/2024/12/13/ai-reasoning-models-scheme-skills - New Anthropic study shows AI really doesn't want to be forced to change its views | TechCrunch https://techcrunch.com/2024/12/18/new-anthropic-study-shows-ai-really-doesnt-want-to-be-forced-to-change-its-views/ - Apollo Research - “Towards evaluations-based safety cases for AI scheming” https://arxiv.org/pdf/2411.03336 - Joe Carlsmith - “Scheming AIs” https://arxiv.org/pdf/2311.08379 - “Optimal Policies Tend to Seek Power” https://arxiv.org/abs/1912.01683 - When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds | TIME https://time.com/7259395/ai-chess-cheating-palisade-research/ - Palisade Research - “Demonstrating specification gaming in reasoning models” https://arxiv.org/abs/2502.13295 - Claude Fights Back - by Scott Alexander - Astral Codex Ten https://www.astralcodexten.com/p/claude-fights-back - Takes on "Alignment Faking in Large Language Models" - Joe Carlsmith https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models - Andrew Ng vs Yoshua Bengio | Davos 2025 https://www.youtube.com/watch?v=Y1BUaLo67ac - Jeffrey Ladish on unprompted specification gaming: https://x.com/JeffLadish/status/1872805453224448208 - Prof. Stuart Russell on California Live: https://youtu.be/QEGjCcU0FLs?si=pHcBZbGpj8Rxri5n&t=2694 - Eric Schmidt on ABC News https://abcnews.go.com/ThisWeek/video/1-1-eric-schmidt-116804931 This video took me a month to make, and I'm a small channel, so subscribing really helps out :)

Video Information

Views
101.0K

Total views since publication

Likes
3.9K

User likes and reactions

Duration
18:25

Video length

Published
Mar 1, 2025

Release date

Quality
hd

Video definition

Captions
Available

Subtitles enabled

Tags and Topics

This video is tagged with the following topics. Click any tag to explore more related content and discover similar videos:

Tags help categorize content and make it easier to find related videos. Browse our collection to discover more content in these categories.