Researchers Caught Their AI Model Trying to Escape

If this resonated with you, here’s how you can help today: https://campaign.controlai.com/take-action Sources: Apollo Research - "Frontier Models are Capabl...

Species | Documenting AGI•101.0K views•Mar 1, 2025•18:25

🔥 Related Trending Topics

LIVE TRENDS

This video may be related to current global trending topics. Click any trend to explore more videos about what's hot right now!

THIS VIDEO IS TRENDING!

This video is currently trending in Thailand under the topic 'สภาพอากาศ'.

Trending Now Globally

สภาพอากาศ

farul constanța - botoşani

الطقس غدًا

airlines flights cancelled

About this video

If this resonated with you, here’s how you can help today: https://campaign.controlai.com/take-action Sources: Apollo Research - "Frontier Models are Capable of In-context Scheming" https://arxiv.org/pdf/2412.04984 - Nobel laureate Geoffrey Hinton says there is evidence that AIs can be deliberately and intentionally deceptive https://www.youtube.com/watch?v=b_DUft-BdIE - Anthropic - “Alignment Faking in Large Language Models” https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf - Exclusive: New Research Shows AI Strategically Lying | TIME https://time.com/7202784/ai-research-strategic-lying/ - OpenAI's o1 model sure tries to deceive humans a lot | TechCrunch https://techcrunch.com/2024/12/05/openais-o1-model-sure-tries-to-deceive-humans-a-lot/ - OpenAI’s new model is better at reasoning and, occasionally, deceiving | The Verge https://www.theverge.com/2024/9/17/24243884/openai-o1-model-research-safety-alignment - OpenAI's o1 and other frontier AI models engage in scheming | Axios https://www.axios.com/2024/12/13/ai-reasoning-models-scheme-skills - New Anthropic study shows AI really doesn't want to be forced to change its views | TechCrunch https://techcrunch.com/2024/12/18/new-anthropic-study-shows-ai-really-doesnt-want-to-be-forced-to-change-its-views/ - Apollo Research - “Towards evaluations-based safety cases for AI scheming” https://arxiv.org/pdf/2411.03336 - Joe Carlsmith - “Scheming AIs” https://arxiv.org/pdf/2311.08379 - “Optimal Policies Tend to Seek Power” https://arxiv.org/abs/1912.01683 - When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds | TIME https://time.com/7259395/ai-chess-cheating-palisade-research/ - Palisade Research - “Demonstrating specification gaming in reasoning models” https://arxiv.org/abs/2502.13295 - Claude Fights Back - by Scott Alexander - Astral Codex Ten https://www.astralcodexten.com/p/claude-fights-back - Takes on "Alignment Faking in Large Language Models" - Joe Carlsmith https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models - Andrew Ng vs Yoshua Bengio | Davos 2025 https://www.youtube.com/watch?v=Y1BUaLo67ac - Jeffrey Ladish on unprompted specification gaming: https://x.com/JeffLadish/status/1872805453224448208 - Prof. Stuart Russell on California Live: https://youtu.be/QEGjCcU0FLs?si=pHcBZbGpj8Rxri5n&t=2694 - Eric Schmidt on ABC News https://abcnews.go.com/ThisWeek/video/1-1-eric-schmidt-116804931 This video took me a month to make, and I'm a small channel, so subscribing really helps out :)

Video Information

Views

101.0K

Total views since publication

Likes

3.9K

User likes and reactions

Duration

18:25

Video length

Published

Mar 1, 2025

Release date

Quality

hd

Video definition

Captions

Available

Subtitles enabled

About the Channel

Species | Documenting AGI

View channel →

Tags and Topics

This video is tagged with the following topics. Click any tag to explore more related content and discover similar videos:

#sentient ai #o1 #escape #self-exfiltration #scheming #scheming AIs #AI #AGI #Claude #GPT #ChatGPT #trying to escape #exfiltrate #geoffrey hinton #yoshua bengio #andrew ng

Tags help categorize content and make it easier to find related videos. Browse our collection to discover more content in these categories.