The Coordination Problem in the Race to AGI | Holden Karnofsky (Anthropic)
For years, working on AI safety usually meant theorizing about the âalignment problemâ or trying to convince others to care. If you could find a...
About this video
For years, working on AI safety usually meant theorising about the âalignment problemâ or trying to convince other people to give a damn. If you could find any way to help, the work was frustrating and low feedback. According to Anthropicâs Holden Karnofsky, this situation has now reversed completely.
There are now large amounts of useful, concrete, shovel-ready projects with clear goals and deliverables. Holden thinks people havenât appreciated the scale of the shift, and wants everyone to see the large range of âwell-scoped object-level workâ they could personally help with, in both technical and non-technical areas.
In todayâs interview, Holden â previously cofounder and CEO of Open Philanthropy â lists 39 projects heâs excited to see happening, including:
â˘Â Training deceptive AI models to study deception and how to detect it
â˘Â Developing classifiers to block jailbreaking
â˘Â Implementing security measures to stop âbackdoorsâ or âsecret loyaltiesâ from being added to models in training
â˘Â Developing policies on model welfare, AI-human relationships, and what instructions to give models
â˘Â Training AIs to work as alignment researchers
And thatâs all just stuff heâs happened to observe directly, which is probably only a small fraction of the options available.
Holden makes a case that, for many people, working at an AI company like Anthropic will be the best way to steer AGI in a positive direction. He notes there are âways that you can reduce AI risk that you can only do if youâre a competitive frontier AI company.â At the same time, he believes external groups have their own advantages and can be equally impactful.
Critics worry that Anthropicâs efforts to stay at that frontier encourage competitive racing towards AGI â significantly or entirely offsetting any useful research they do. Holden thinks this seriously misunderstands the strategic situation weâre in â and explains his case in detail with host Rob Wiblin.
*Full transcript and links to learn more:* https://80k.info/hk25
_This episode was recorded on July 25 and 28, 2025._
Chapters:
⢠Cold open (00:00:00)
⢠Holden is back! (00:02:28)
⢠An AI Chernobyl we never notice (00:02:58)
⢠Is rogue AI takeover easy or hard? (00:07:39)
⢠The AGI race isn't a coordination failure (00:18:01)
⢠What Holden now does at Anthropic (00:28:30)
⢠The case for working at Anthropic (00:30:38)
⢠Is Anthropic doing enough? (00:41:30)
⢠Can we trust Anthropic, or any AI company? (00:44:30)
⢠How can Anthropic compete while paying the âsafety taxâ? (00:50:11)
⢠What, if anything, could prompt Anthropic to halt development of AGI? (00:57:13)
⢠Holden's retrospective on responsible scaling policies (01:00:04)
⢠Overrated work (01:15:45)
⢠Concrete shovel-ready projects Holden is excited about (01:17:58)
⢠Great things to do in technical AI safety (01:22:12)
⢠Great things to do on AI welfare and AI relationships (01:29:53)
⢠Great things to do in biosecurity and pandemic preparedness (01:36:51)
⢠How to choose where to work (01:37:37)
⢠Overrated AI risk: Cyberattacks (01:43:38)
⢠Overrated AI risk: Persuasion (01:53:28)
⢠Why AI R&D is the main thing to worry about (01:57:31)
⢠The case that AI-enabled R&D wouldn't speed things up much (02:09:30)
⢠AI-enabled human power grabs (02:13:26)
⢠Main benefits of getting AGI right (02:26:04)
⢠The world is handling AGI about as badly as possible (02:31:44)
⢠Learning from targeting companies for public criticism in farm animal welfare (02:34:18)
⢠Will Anthropic actually make any difference? (02:43:43)
⢠âMisalignedâ vs âmisaligned and power-seekingâ (02:58:23)
⢠Success without dignity: how we could win despite being stupid (03:04:16)
⢠Holden sees less dignity but has more hope (03:12:00)
⢠Should we expect misaligned power-seeking by default? (03:19:43)
⢠Will reinforcement learning make everything worse? (03:27:36)
⢠Should we push for marginal improvements or big paradigm shifts? (03:32:54)
⢠Should safety-focused people cluster or spread out? (03:35:32)
⢠Is Anthropic vocal enough about strong regulation? (03:39:55)
⢠Is Holden biased because of his financial stake in Anthropic? (03:43:30)
⢠Have we learned clever governance structures don't work? (03:47:57)
⢠Is Holden scared of AI bioweapons? (03:50:20)
⢠Holden thinks AI companions are bad news (03:53:58)
⢠Are AI companies too hawkish on China? (04:00:53)
⢠The frontier of infosec: confidentiality vs integrity (04:05:06)
⢠How often does AI work backfire? (04:07:55)
⢠Is AI clearly more impactful to work in than other causes? (04:22:50)
⢠What's the role of earning to give? (04:29:22)
_Video editing: Simon Monsour, Luke Monsour, Dominic Armstrong, and Milo McGuire_
_Audio engineering: Milo McGuire, Simon Monsour, and Dominic Armstrong_
_Music: CORBIT_
_Coordination, transcripts, and web: Katy Moore_
4.6
4 user reviews
Write a Review
User Reviews
0 reviewsBe the first to comment...
Video Information
Views
4.9K
Total views since publication
Likes
138
User likes and reactions
Duration
04:34:58
Video length
Published
Oct 30, 2025
Release date
Quality
hd
Video definition
Captions
Available
Subtitles enabled
About the Channel
Related Trending Topics
LIVE TRENDSThis video may be related to current global trending topics. Click any trend to explore more videos about what's hot right now!
THIS VIDEO IS TRENDING!
This video is currently trending in Turkey under the topic 'bursa deprem'.