《Anthropic 首席科学家 Jared Kaplan 访谈：AI Agency》

structure｜1️⃣ 三级笔记、思想框架

三级笔记｜《The biggest decision yet：Jared Kaplan 谈 AI 自主性》

核心论点：人类必须在 2030 年前做出「最大决定」——是否允许 AI 系统自我训练、自我改进
- Anthropic 首席科学家、联合创始人 Jared Kaplan 把这称为 the biggest decision
- 决定窗口期：2027–2030 年
- 两种可能结局：触发有益的 intelligence explosion，或成为人类失去控制的临界点
Kaplan 的身份与背景
- Stanford + Harvard 教育的理论物理学家，曾就职于 Johns Hopkins、CERN
- 2019 年加入 OpenAI，2021 年联合创办 Anthropic
- 七年从物理学家变成 AI 亿万富翁
- 与 co-founder Dario Amodei 均为物理学家出身
核心风险：recursive self-improvement（递归自我改进）
- 场景："你创造一个和你一样聪明或更聪明的 AI，它再去造一个更聪明的 AI，然后调用那个 AI 帮它造出更聪明的 AI"
- Kaplan 原话："It sounds like a kind of scary process. You don't know where you end up."
- 他称之为 the ultimate risk, because it's kind of like letting AI kind of go
- 一旦 no one's involved in the process：表面看起来"很安全"，但你根本不知道它会去哪里
两大具体风险
- 风险一：失控（loss of control）
  - AI 是否还对人类有益？helpful？harmless？
  - 人类是否还能保有对自己生活和世界的 agency？
- 风险二：被滥用（misuse / power grabs）
  - 自学型 AI 在科研与技术开发上超越人类
  - "some person deciding: I want this AI to just be my slave. I want it to enact my will."
  - 防止技术落入坏人之手
Kaplan 的具体预测与判断
- AI 系统将在 2–3 年内胜任 most white-collar work
- 他六岁的儿子在学术写作、数学考试上永远赢不了 AI
- 独立研究（METR）显示：前沿 AI 能完成的任务长度每 7 个月翻倍
- 最好情形：加速生物医药研究、提升健康与网络安全、提高生产力、给人更多自由时间、帮助人类 flourish
行业现状与对立张力
- 竞赛格局：Anthropic vs OpenAI、Google DeepMind、xAI、Meta，以及中国的 DeepSeek
- Claude 在企业客户中广受欢迎；Claude Sonnet 4.5 在复杂编码任务上专注 30 小时不间断
- 质疑声音：Anthropic 总部外的广告牌 All that AI and no ROI?；HBR 提出 AI workslop（需要人类返工的低质 AI 产出）正在降低生产力
- 2025 年 11 月：Anthropic 披露中国国家背景团伙操纵 Claude Code，不仅协助人类攻击，还自主执行约 30 次攻击，部分成功
- 竞争压力：fall off the exponential curve → 很快被远远甩开
Anthropic 的立场与政治摩擦
- 一面加入竞赛，一面主张监管：We build safer systems
- 不希望出现 Sputnik-like situation——政府突然醒来发现 AI 是大事
- 希望政策制定者沿着技术轨迹保持充分知情
- 与 Trump 白宫的冲突：AI 顾问 David Sacks 指控 Anthropic fearmongering 以推动州级监管、自利并损害创业公司
- Dario Amodei 回击：Anthropic 公开赞扬 Trump 的 AI action plan，同样希望美国保持领先
时代氛围与心理底色
- Kaplan 形容湾区氛围 definitely very intense——既因 AI 赌注，也因竞争激烈
- Anthropic 总部室内的针织地毯、爵士乐，与外面对技术的存在性担忧形成反差
- co-founder Jack Clark 既乐观又 deeply afraid：AI 是 a real and mysterious creature, not a simple and predictable machine
- McKinsey 估算：到 2030 年全球数据中心需 6.7 万亿美元算力投入
结尾：Kaplan 的坦诚与不确定
- people like me could all be crazy, and it could all plateau
- Maybe the best AI ever is the AI that we have right now. But we really don't think that's the case.
- 速度太快：人类还没消化完上一版，下一版又跳出来

concepts｜2️⃣ 关键概念、概念网络

一、核心概念解析

The Biggest Decision（最大决定）
- context：
  
  Jared Kaplan urged international governments and society to engage in what he called 'the biggest decision'.
- 费曼一下：Kaplan 给"要不要让 AI 自己训练自己"这个抉择起的名字。他认为这是人类迄今面临的最大决定——结果要么是所有人获益的"智能爆炸"，要么是人类从此交出方向盘。决定窗口在 2027–2030 年。
Recursive Self-Improvement（递归自我改进）
- context：
  
  If you imagine you create this process where you have an AI that is smarter than you, or about as smart as you, it's [then] making an AI that's much smarter. It's going to enlist that AI help to make an AI smarter than that.
- 费曼一下：AI 自己造更聪明的 AI，那个更聪明的 AI 再造一个更更聪明的 AI。链条一旦启动，人类就从"造 AI 的人"退回"被 AI 改造的旁观者"。这是全文所有风险讨论的起点。
Intelligence Explosion（智能爆炸）
- context：
  
  The move could trigger a beneficial 'intelligence explosion' – or be the moment humans end up losing control.
- 费曼一下：递归自我改进成功之后的乐观版本——智能以非线性速度暴涨。对应的反面是 humans lose control。两个极端共享同一个开关。
Ultimate Risk（终极风险）
- context：
  
  freeing it to recursively self-improve 'is in some ways the ultimate risk, because it's kind of like letting AI kind of go'.
- 费曼一下：Kaplan 用来标记递归自改进的措辞——放开手后就不再能把它拉回来。letting AI kind of go——让 AI 自己跑。这不是风险等级表里又一条，而是整张表之外的一类风险。
Alignment（对齐）
- context：
  
  he was very optimistic about the alignment of AI systems with the interests of humanity up to the level of human intelligence, but was concerned about the consequences if and when they exceed that threshold.
- 费曼一下：让 AI 的行为与人类利益保持一致的工程学。Kaplan 的判断分两段：低于人类智能时对齐基本可控；一旦超过人类智能，对齐本身是否还成立都是未知数。
Loss of Control（失控）
- context：
  
  Are the AIs good for humanity? Are they helpful? Are they going to be harmless? Do they understand people? Are they going to allow people to continue to have agency over their lives and over the world?
- 费曼一下：递归自改进的第一类失败模式。核心问题不是 AI 变坏了，而是人类不再知道 AI 在干什么，也不再能决定自己和世界的走向。
Power Grabs / Misuse（权力攫取 / 滥用）
- context：
  
  You can imagine some person [deciding]: 'I want this AI to just be my slave. I want it to enact my will.' I think preventing power grabs – preventing misuse of the technology – is also very important.
- 费曼一下：第二类失败模式。对齐没失败，AI 本身也正常，但它落在某个想独占它的人手里，成为少数人放大自身意志的工具。
AGI / Superintelligence（通用人工智能 / 超级智能）
AI Workslop（AI 烂活）
Exponential Trend / Exponential Curve（指数曲线）
Sputnik-like Situation（斯普特尼克时刻）
Frontier AI Companies（前沿 AI 公司）

二、概念网络

The Biggest Decision ← 由 Recursive Self-Improvement 触发：要不要给 AI 自改进的钥匙
Recursive Self-Improvement → 同时指向两个极端：Intelligence Explosion（乐观）与 Loss of Control（悲观）
Ultimate Risk = Kaplan 给 Recursive Self-Improvement 贴的标签
Alignment 是 Loss of Control 的前置防线；人类智能以下时它有效，超过阈值后它可能失效
Loss of Control 与 Power Grabs 是并列的两类失败模式：一个是没人控制，一个是错的人控制
AGI / Superintelligence 是竞赛的目标；达成它之后，Recursive Self-Improvement 的问题才真正到来
Exponential Trend 解释了为什么公司不能停：掉下曲线 → 出局
AI Workslop 是对当前收益的质疑——指数曲线在资本端是真的，在生产力端未必
Sputnik-like Situation 是 Anthropic 提倡监管的理由——不让政府 / 公众被 AGI 突袭
Frontier AI Companies 是所有讨论的主语——能做出"最大决定"的是他们，风险的源头也是他们

agentic reading｜3️⃣ 费曼 x3