Researchers Caught Their AI Model Trying to Escape
If this resonated with you, here’s how you can help today: https://campaign.controlai.com/take-action Sources: Apollo Research - "Frontier Models are Capabl...
🔥 Related Trending Topics
LIVE TRENDSThis video may be related to current global trending topics. Click any trend to explore more videos about what's hot right now!
THIS VIDEO IS TRENDING!
This video is currently trending in Thailand under the topic 'สภาพอากาศ'.
About this video
If this resonated with you, here’s how you can help today: https://campaign.controlai.com/take-action
Sources: Apollo Research - "Frontier Models are Capable of
In-context Scheming" https://arxiv.org/pdf/2412.04984
- Nobel laureate Geoffrey Hinton says there is evidence that AIs can be deliberately and intentionally deceptive https://www.youtube.com/watch?v=b_DUft-BdIE
- Anthropic - “Alignment Faking in Large Language Models” https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf
- Exclusive: New Research Shows AI Strategically Lying | TIME https://time.com/7202784/ai-research-strategic-lying/
- OpenAI's o1 model sure tries to deceive humans a lot | TechCrunch https://techcrunch.com/2024/12/05/openais-o1-model-sure-tries-to-deceive-humans-a-lot/
- OpenAI’s new model is better at reasoning and, occasionally, deceiving | The Verge
https://www.theverge.com/2024/9/17/24243884/openai-o1-model-research-safety-alignment
- OpenAI's o1 and other frontier AI models engage in scheming | Axios
https://www.axios.com/2024/12/13/ai-reasoning-models-scheme-skills
- New Anthropic study shows AI really doesn't want to be forced to change its views | TechCrunch
https://techcrunch.com/2024/12/18/new-anthropic-study-shows-ai-really-doesnt-want-to-be-forced-to-change-its-views/
- Apollo Research - “Towards evaluations-based safety cases for AI scheming” https://arxiv.org/pdf/2411.03336
- Joe Carlsmith - “Scheming AIs”
https://arxiv.org/pdf/2311.08379
- “Optimal Policies Tend to Seek Power”
https://arxiv.org/abs/1912.01683
- When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds | TIME https://time.com/7259395/ai-chess-cheating-palisade-research/
- Palisade Research - “Demonstrating specification gaming in reasoning models” https://arxiv.org/abs/2502.13295
- Claude Fights Back - by Scott Alexander - Astral Codex Ten https://www.astralcodexten.com/p/claude-fights-back
- Takes on "Alignment Faking in Large Language Models" - Joe Carlsmith https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models
- Andrew Ng vs Yoshua Bengio | Davos 2025 https://www.youtube.com/watch?v=Y1BUaLo67ac
- Jeffrey Ladish on unprompted specification gaming: https://x.com/JeffLadish/status/1872805453224448208
- Prof. Stuart Russell on California Live: https://youtu.be/QEGjCcU0FLs?si=pHcBZbGpj8Rxri5n&t=2694
- Eric Schmidt on ABC News https://abcnews.go.com/ThisWeek/video/1-1-eric-schmidt-116804931
This video took me a month to make, and I'm a small channel, so subscribing really helps out :)
Video Information
Views
101.0K
Total views since publication
Likes
3.9K
User likes and reactions
Duration
18:25
Video length
Published
Mar 1, 2025
Release date
Quality
hd
Video definition
Captions
Available
Subtitles enabled
About the Channel
Tags and Topics
This video is tagged with the following topics. Click any tag to explore more related content and discover similar videos:
#sentient ai #o1 #escape #self-exfiltration #scheming #scheming AIs #AI #AGI #Claude #GPT #ChatGPT #trying to escape #exfiltrate #geoffrey hinton #yoshua bengio #andrew ng
Tags help categorize content and make it easier to find related videos. Browse our collection to discover more content in these categories.