FSD with OPD?

今天 Thinking Machines 的这篇 OPD 真是开眼界。https://thinkingmachines.ai/blog/on-policy-distillation/

同时,我怎么感觉特斯拉已经在走 OPD 这条路一段时间了。Ashok 在前几天的演讲(https://x.com/aelluswamy/status/1981644831790379245)里面再次展示了神经世界模拟器,还有我第一次听到他们确认有可解释中间 token 的语言推理。特斯拉已经具备所有“食材”,对于训练 FSD 来说,OPD 就是在神经世界模拟器中,让学生模型 closed loop 生成自己的轨迹;用更强的教师对每一步输出 log-probs 打分,最小化 reverse-KL:KL(student‖teacher),相当于把“教师认为绝不能做的动作”强力惩罚,从而以稠密过程监督替代 RL 的稀疏奖励。这在训练效率上比 RL 显著便宜,也比纯 SFT 更贴近学生真实分布与早期 forking。

这次 FSD v14.1.3 有很多亮眼的地方,例如 drive thru,还有在停车场的表现,但却在很多“简单”的地方退步了,例如 phantom stop,乱变道之类的 v13 已经基本解决的能力。但 v14.1.4 能在一个星期之后就放出来,而且看似大幅度改善了那些不足。这种迭代肯定是后训练的。RL 太稀疏,但 SFT 又不像能修复这种看似就是由于数据分布失衡而引起的遗忘。但 OPD 应该好使。

另外,我的 HW3 一直盼着 Robotaxi 版本的 FSD 塞不进的话,马老板就会帮忙升级硬件。然而,OPD 做的好的话,估计可以硬生生靠教师模型给学生轨迹逐步打分,把大部分驾驶智慧压进一个轻模型。尤其是利用可解释中间 token,按照教师模型的每一步规划草案和意图描述等中间 token,给学生模型的每一步规划打分奖励。财报会议提到明年第二季度的 v14 Lite 大概就是这样一个物体。

Robotaxi 预测

最近在研究 Robotaxi 落地的问题时,在我之前总结的 Waymo 难以迅速铺开的原因之上,又发现了一些新的痛点,便写了下来。但写着写着,我意识到自己钻进来了一个思维上的牛角尖:为什么要把 Robotaxi 和 Waymo 做比较?它们本就不是处在同一个赛道上的事物。于是,那篇关于 Waymo 更多痛点的文章也就懒得单独发布了,只是附在这篇文章的最后,作为我思考过程的一部分记录。

Uber 挤占的是原来传统出租车的市场,而 Waymo 挤占的则是 Uber 的市场。就算最终占领整个市场,也不会比原来的传统出租车市场大多少。在大部分时段和地区,出租车在路面汽车中的占比平均都不会超过 0.5%。但 Robotaxi 的目标是——未来路上所有汽车都将实现自动化无人驾驶,换句话说,它的目标是取代当前路面上 100% 的汽车?0.5% 对比 100%,怎么可能是同一条赛道上的竞争?

可证伪一直是我写预言的标准。接下来,我将为每一年的结束描绘一个场景,并在每年进行回顾和调整。应该会非常有趣。

当前的一些数据如下:
全美私家车约 2.9 亿辆
全美网约车约 200 万辆
全美特斯拉中,搭载 HW3 的约 200 万辆,搭载 AI4 的约 85 万辆
奥斯汀市内,网约车约 3000 辆,Waymo 约 100 辆

2025 年底(在奥斯汀占据 25% 以上的网约车市场份额):
奥斯汀:特斯拉投入 500 辆 Model Y,少量 Cybercab 非量产原型车进入试运营阶段,少量配备 AI4 硬件的员工私家 Tesla Model Y 开始参与试运营。
旧金山、洛杉矶、圣安东尼奥:每个城市投入 100 辆 Model Y。虽然这些城市的服务区域都比奥斯汀小,但重点并非抢占市场,而是用于媒体公关和证明 FSD 的普适性。

2026 年底(在奥斯汀完成网约车市场份额反超,多个城市开始占据 10% 以上的网约车份额):
奥斯汀、旧金山湾区:每个区域投入 1000 辆 Model Y 和 5000 辆 Cybercab。Cybercab 由私人及小型投资者运营。大量无线充电地垫分布在城市各个角落及住宅区,任何 Cybercab 均可前往任意地垫充电,地垫所有者可按次获得充电提成。
全美数十到上百个城市:共计部署 10 万辆 Cybercab,由私人及小金主投资运营。
全美范围内:搭载 AI4/5 硬件的特斯拉车主可自愿加入 Robotaxi 车队。

2027 年底(多个城市实现网约车市场份额领先):
全美:特斯拉已无需再投放厂家车辆。Robotaxi 车队总规模达到 100 万辆,包括:
50 万辆 Cybercab
15 万辆搭载 AI4/5 的私家车(约占特斯拉 AI4/5 私家车的十分之一)
35 万辆原 HW3 私家车升级后加入(约占特斯拉 HW3 私家车的四分之一,车主因愿意加入 Robotaxi 车队而获得免费升级)
全球:Cybercab 和私家车运营模式开始在海外若干城市复制推广。

2028 年底(多个城市实现网约车垄断,全国范围内网约车市场份额领先,Uber/Lyft 渐渐退出历史舞台):
全美:Robotaxi 车队总规模达到 300 万辆,其中包括:
200 万辆 Cybercab
50 万辆搭载 AI4/5 的私家车
50 万辆 HW3 升级后的私家车

2029 年底(全美网约车市场实现垄断):
Robotaxi 车队总规模达到 600 万辆,其中包括:
400 万辆 Cybercab
200 万辆私家车

2030 年底:
Robotaxi 车队总规模达到 1000 万辆,其中包括:
700 万辆 Cybercab
300 万辆私家车

203x 年底:
Robotaxi 车队总规模达到 6000 万辆,实现 100% 无人驾驶。

注:
1)不需要完全替代 2.9 亿辆私家车,目前私家车在 90% 的时间里都处于闲置状态,因此大约 3000 万辆 Robotaxi 就已足够满足需求。但随着出行便利性的提升,将反过来刺激出行频率的增加,所以预估总量为 6000 万辆。

2)由于无法准确预测其他车厂在未来五年的应对策略,因此干脆不对除特斯拉之外的整体进程作出预测。但可以非常确定的是,其他车厂也必将积极投入,加速向这个 6000 万辆无人驾驶目标迈进。


在之前的文章中,我解释过 Waymo 无法快速全面铺开的两个主要原因:整车硬件成本,以及高精地图的启动和维护成本。

今天再来说说其他运营成本的问题。

场地。当前 Waymo 在旧金山有 600 辆 I-PACE。到了深夜,网约车需求进入低谷,大部分车辆必须找地方停靠,进行充电和清洁。这就要求在每一个新进入的城市,在正式运营之前,就已经找好多个或一个大型的停靠地点。而这些地点由于成本原因,往往只能设在城市边缘,并且必须配备能同时为几十甚至上百辆车充电的设备。由于停靠地点设在城市边缘,所以每天出车和收车的两趟,大概率都是空载运行。

投入数量。对于每个城市,投入运营的车辆数量必须足以应对早晚高峰,因为当前 Uber 在早晚高峰时段的用户体验并不差。这也就意味着,非高峰时段会有大量空载或待命的车辆。就像在没有云计算的时代,每个 .com 的机房投入都是为了不在高峰期崩溃,大部分时间处于超配状态。如果自动驾驶车辆在每个城市的投入也采取类似模式,就会在变相上大大延长成本回收的周期。当然,一个应对方式是少量投入车辆,并在高峰时段通过溢价来压缩需求,以牺牲用户体验的方式来平衡投入与回报。

跨城市。Waymo 最近将旧金山的服务范围扩展到了南部的几个城市,很大程度上是因为其大型车库设在南旧金山。换个角度看,这其实是以车库为中心向各个方向扩展覆盖。为什么跨城市服务难,原因仍然是前面提到的两点:场地和投入数量。车开得太远,充电会成为问题;车辆都调出去了,中心区域的覆盖就会变弱。而且,除非终点也在有效服务区内,否则回程空载的概率就很高。另外,当前的成本(或者说定价)也决定了跨城市服务几乎不可能实现。目前旧金山 Waymo 的价格大约是 $5.6/英里,超短程(<1.5 英里)甚至高达 $11.8/英里。按这个价格,从旧金山到 Fremont(约 40 英里)要花费 200 多美元,是 Uber 的 3 倍,或者是自己开车成本的 10 倍。

特斯拉 Robotaxi 的线路图

Franz Von Holzhausen, Tesla’s Head of Vehicle Design, also confirmed that Tesla will be offering Cybercab rides in Austin starting in June. What’s key here is that he confirmed the presence of Cybercabs finally deploying – it won’t be driverless Model Ys or Model 3s – it’ll be the Cybercab.

That means an autonomy-first vehicle without a driver’s seat, steering wheel, or pedals will be on the road and driving people from point to point. Major autonomy competitors like Waymo use heavily modified EVs that still have seats and vehicle controls intact.

看来6月直接上 Cybercab,不是特斯拉私家车车主的车,果然是制造业效率的天花板。

估计一个大城市需要 300 到 500 辆就够了。特斯拉 CyberCab 的小规模生产线肯定已经就绪。如果一天能造 50 辆的话,一个星期就能完成投放。而且,其他车型目前全球日均产量约 5000 辆, CyberCab 现在达到 1% 应该不成问题。

在大规模量产之前,假设成本 $30,000 一辆,500 辆覆盖一个城市的话,不涉及私家车的情况下,大约需要 $15MM 和一周的制造时间。所以,到年底覆盖十几个大城市应该不难。当然,还要考虑安全、法规和前期投入等成本,但估计单个城市的总成本不会超过 $50MM。

只能解决通勤和城市内部的短途出行需求,但这基本就是正面对抗 Waymo 的市场了。跨城市和长途 Robotaxi 仍然需要依赖私家车,这估计是后续才需要解决的问题。

如果 500 辆车的投放成本是 $50MM,那么每辆车只要运营利润达到 $100K 就能回本。我们粗算一下:按 Uber 类似的收费,每英里 $2,假设扣除运营成本(充电、保养、保险等)后,每英里利润 $1。如果一辆 CyberCab 每天跑 100 英里,那就是 $100/天,要回本需要 1000 天(大约 3 年),这个周期还算可以接受,但也不算特别理想。所以,特斯拉还是需要小老板们愿意投资,组建 Robotaxi 车队,或者等 2026 年私家车加入运营。而这一切的前提是,FSD 在今年年中或年底前,能在旧金山达到 Waymo 同等的安全水平。

线路图基本已经清晰了。

特斯拉到底还需不需要私家车加入 Robotaxi?如果目标是快速全国铺开,那当前路上数百万辆特斯拉私家车一定得上。原因是,机器学习的进步太快,留给特斯拉的时间不多。资本市场已经盯上了 Robotaxi 这块蛋糕,拿出 $50MM 覆盖一个城市的投资人不在少数,其他车厂一天造 50 辆车也完全不是问题。特斯拉的真正护城河是 FSD,但如果突然有黑马杀出,比如某种算法能省 10 倍算力,或者只需要 1% 的训练数据就能达到同等水平,那特斯拉的领先优势就会迅速缩小。

所以,特斯拉必须快刀斩乱麻,在短时间内全国铺开。一方面让 CyberCab 尽快量产,降低成本,另一方面利用现有私家车快速占领市场。最终目标是把网约车的价格直接砍半再砍半,低到对手就算疯狂烧钱也根本无力竞争。

推理模型的逻辑功底和基座模型偏见对其的影响

我问了各家推理模型以下问题:

汽车销售收入是特斯拉收入的最大组成部分。因此,我认为每季度财报电话会议中公布的汽车交付数量是投资特斯拉时最重要的关注因素。按照同样的逻辑,在 Wayfair 的财报电话会议中,最重要的关注因素应当是什么?

网上大部分分析财报都会重点看商品交易总额(GMV)或者净总收入。但我这个问题是要找一个类比逻辑的答案。正确答案应该是订单量或者出单量,对应特斯拉的汽车交付量。

ChatGPT o1 Pro回答正确。最神奇的是DeepSeek,思维链是对的,但最后把链子掉了,回答却是另一个答案。


Gemini 2.0 Thinking Exp


DeepSeek DeepThink R1


ChatGPT o1 Pro Mode


作为对比,试了不是推理模型的几个天花板。

Perplexity Pro


Claude 3.5 Sonnet

DeepSeek R1 在 coding 方面的确强

其实是它的 RL 部分(R1-zero)确实很强。在我的日常使用场景中,最近每次对比 o1 Pro 和 r1,结果要么是各有千秋,要么就是 o1 Pro 更出彩。今天遇到一个非常棘手的 bug,用 o1 Pro 调试后,它能够找到逻辑漏洞,并且确实给出了修改过、没有 bug 的代码,不过它是用另一种完全不同的思路,绕过了我原来有问题的逻辑来实现的。之后我又试了一下 r1,它却能精准定位到这个 bug 并精确地修复,实在是太强了!

对比一下:


ChatGPT o1 Pro

One key issue is that ensure_minimum_dialogs only ever checks how many new dialogs it created (via len(new_dialogs)) rather than how many total unused dialogs now exist in the database after the first pass. In other words, if you had to create 5 new dialogs but, for some of the “teaching” words, an unused dialog already exists, the code will create fewer than 5—and then it never re-checks how many total unused dialogs remain. It just compares len(new_dialogs) against min_dialogs, which can cause the function to return fewer dialogs than needed.


Why that logic causes the shortfall

  1. Initial check in gen_dialog_for_profile:

    current_unused = get_unused_dialog_count(profile_id)
    dialogs_needed = max(0, MIN_DIALOGS - current_unused)
    # e.g. we need 5 more dialogs to reach 12
    
  2. Call ensure_minimum_dialogs(profile_id, 5, ref_voice_id):

    • The function fetches all words in state "teaching", shuffles them, and calls generate_dialogs_for_words(...).
    • For each word, generate_dialogs_for_words will skip creating a new dialog if it finds an existing unused dialog for that word.
    • Consequently, if you passed in 5 teaching words, it might only create 3 new dialogs because 2 of those words already had an unused dialog.
    • At this point, new_dialogs will have length 3.
  3. Check if len(new_dialogs) < min_dialogs:
    The comparison is if 3 < 5: → “Yes, we still need more.” So it moves on to pending words:

    remaining_needed = min_dialogs - len(new_dialogs)  # 5 - 3 = 2
    new_dialogs.extend(
        await generate_dialogs_for_words(
            pending_words[:2], profile_id, ref_voice_id
        )
    )
    
    • Now, if among those 2 pending words, it only creates 1 new dialog because the other word had an existing unused dialog, you end up with new_dialogs length = 4 in total.
    • The function returns those 4.
    • We never re-check how many total unused dialogs exist after creating those 4. For instance, maybe we still only have a total of 10 unused dialogs (we wanted 12).

Essentially, the code is built around “we need N newly created dialogs,” but the presence of existing unused dialogs for certain words means you never actually create a full N. Then the code just stops without re-checking how many total unused are in the DB.


How to fix it

To ensure you really reach MIN_DIALOGS total unused dialogs, you should:

  1. Keep re-checking how many unused dialogs exist after each batch of creations (or each pass) rather than relying on the count of newly-created dialogs.
  2. Or, in a loop, do something like this:
async def ensure_minimum_dialogs(profile_id: str, min_dialogs: int, ref_voice_id: str) -> List[dict]:
    total_newly_created = []
    
    while True:
        current_count = get_unused_dialog_count(profile_id)
        if current_count >= min_dialogs:
            # We have enough total unused dialogs now.
            break
        
        # Attempt to generate more dialogs from the next pool of words
        # (teaching first, then pending, etc.)
        needed = min_dialogs - current_count
        
        # Possibly break out if you have no more words to generate from
        # or keep a “teaching first, then pending” approach:
        newly_created_this_pass = await try_create_some_dialogs(profile_id, needed, ref_voice_id)
        
        if not newly_created_this_pass:
            # Means we could not create any more new dialogs
            # (maybe we ran out of words in 'teaching' and 'pending')
            break
        
        total_newly_created.extend(newly_created_this_pass)
    
    return total_newly_created

In short:

  • After each batch of new dialog creation, re-check how many total unused dialogs you have.
  • Keep going (if possible) until you really do have the “minimum” total or until you run out of possible words.

The simple one-line explanation:

The bug is that ensure_minimum_dialogs checks len(new_dialogs) instead of re-checking the database to see how many total unused dialogs actually exist after it creates some. Thus, it never compensates for the case where it fails to create as many new dialogs as it initially set out to.


DeepSeek R1

The primary issue is that the code processes a limited subset of pending words, potentially missing eligible words that could generate needed dialogs. Here’s the fix:

Step-by-Step Explanation:

  1. Problem Identification:

    • In ensure_minimum_dialogs, when processing pending words, the code slices the list to pending_words[:remaining_needed].
    • This limits the number of pending words checked, possibly excluding words without existing dialogs beyond the slice.
  2. Solution:

    • Process all pending words (after shuffling) instead of a subset to maximize potential dialog generation.

Modified Code:

async def ensure_minimum_dialogs(
    profile_id: str, min_dialogs: int, ref_voice_id: str
) -> List[dict]:
    new_dialogs = []

    # First try teaching words
    teaching_words = get_words_in_state(profile_id, "teaching")
    if teaching_words:
        random.shuffle(teaching_words)
        generated = await generate_dialogs_for_words(teaching_words, profile_id, ref_voice_id)
        new_dialogs.extend(generated)

    # If we still need more dialogs, try pending words
    if len(new_dialogs) < min_dialogs:
        pending_words = get_words_in_state(profile_id, "pending")
        if pending_words:
            random.shuffle(pending_words)
            # Process all pending words to maximize dialog generation
            generated = await generate_dialogs_for_words(pending_words, profile_id, ref_voice_id)
            new_dialogs.extend(generated)

    # Trim to exactly meet the required min_dialogs if exceeded (optional)
    return new_dialogs[:min_dialogs]

Key Changes:

  • Process All Pending Words: Remove the slice [:remaining_needed] to check all pending words, increasing the chance to find eligible words.
  • Optional Trimming: If generating more than min_dialogs is undesirable, trim the result (though exceeding the minimum is generally acceptable).

Why This Fixes the Issue:

  • By evaluating all pending words, the code doesn’t miss potential candidates that could contribute to meeting the minimum required dialogs, especially when earlier words in the list already have existing dialogs.

R1 256 秒的思维链

Okay, let me try to figure out why the code might be generating fewer dialogs than the MIN_DIALOGS. So the user says that sometimes it doesn’t reach the minimum, even though they have MIN_DIALOGS set to 12. Let’s look through the code step by step.

First, the main function is gen_dialog_for_profile. It checks the current unused dialogs using get_unused_dialog_count. Then it calculates how many are needed by subtracting current from MIN_DIALOGS. If that’s positive, it calls ensure_minimum_dialogs with the number needed.

In ensure_minimum_dialogs, the code first gets teaching_words. Then it shuffles them and generates dialogs for those words. Then, if the new_dialogs count is still less than min_dialogs (which is the dialogs_needed passed in), it gets pending_words, shuffles, and takes up to remaining_needed.

Wait, but here’s a possible issue. The generate_dialogs_for_words function loops through each word in the teaching_words, and for each word, it checks if there’s already an unused dialog. If not, it creates a new one. So for each word in teaching_words, it might generate 0 or 1 dialog. So if teaching_words has N words, the maximum dialogs generated here is N. But if the number of teaching_words is less than dialogs_needed, then even after generating all possible teaching words, it still needs more. Then it moves to pending_words. But maybe there aren’t enough pending words either.

Wait, but in ensure_minimum_dialogs, the code is supposed to generate up to min_dialogs. Let’s say dialogs_needed is 5. Suppose teaching_words has 3 words. It generates 3 dialogs (assuming none had existing unused). Then remaining is 2. Then it takes pending_words up to 2. So total 5. But what if there are only 1 pending word? Then it would generate 1, total 4. So that’s a problem. Because the code is designed to generate up to the required number, but maybe the available words (teaching + pending) are fewer than the needed.

So the user’s code might have a case where the sum of teaching_words and pending_words is less than dialogs_needed, leading to fewer dialogs generated. For example, if MIN_DIALOGS is 12, current is 5, so dialogs_needed is 7. If teaching_words is 3 and pending_words is 2, total generated would be 5, which is less than 7. Then the code would return 5 new dialogs, leading to a total of 5+5=10, which is still under 12. But the user expects that the code should generate exactly dialogs_needed (7) to reach the minimum. So the code is not handling the case where there aren’t enough words (either teaching or pending) to generate the required dialogs.

But the code is structured to first generate teaching_words, then pending_words. So maybe the problem is that even after generating all possible teaching and pending words, the total is still less than dialogs_needed. So the code doesn’t account for that. It just generates as much as possible from teaching and pending, which may not meet the required dialogs_needed.

Therefore, the bug is that ensure_minimum_dialogs doesn’t check if there are enough words (teaching + pending) to meet the dialogs_needed. So if the user has, say, 0 teaching and 0 pending words, then no dialogs are generated, even though dialogs_needed is 12.

But how to confirm that? Let’s look at the code for get_words_in_state. Maybe there’s a problem there, but the user didn’t provide that function. Alternatively, maybe in the generate_dialogs_for_words function, it’s possible that some words can’t generate a dialog. For example, check_unused_dialog_exists might return True, so even if a word is in teaching or pending state, there’s already an unused dialog for it. So generate_dialogs_for_words would skip creating a dialog for that word.

So in generate_dialogs_for_words, for each word in the list, if there’s no existing unused dialog, it creates a new one. So if a word already has an unused dialog, it’s skipped. Therefore, even if there are 10 teaching words, but all have existing unused dialogs, generate_dialogs_for_words would generate 0 new dialogs for them.

Ah, that’s a key point. So the code tries to generate new dialogs for words that don’t have existing unused ones. So if teaching_words is a list of words, but all of them already have unused dialogs, then generate_dialogs_for_words would return 0 new dialogs. Then, the code would proceed to pending_words, but again, if those words also have existing dialogs, then no new ones are generated. Thus, even if the code is trying to generate dialogs_needed=7, but all words already have dialogs, then new_dialogs would be empty, leading to fewer than needed.

Wait, but in that case, the get_unused_dialog_count would have already counted those existing dialogs. Let’s see: gen_dialog_for_profile checks current_unused = get_unused_dialog_count(profile_id). Then dialogs_needed is max(0, MIN_DIALOGS – current_unused). So if current_unused is already >= MIN_DIALOGS, then no action is taken. But if current_unused is 5, then dialogs_needed is 7. The code then tries to generate 7 new dialogs. However, if in the process of generating those 7, the code can’t find enough words without existing dialogs, then it returns fewer.

So the problem arises when the available words (teaching + pending) don’t have enough words that don’t already have an unused dialog. So the code can’t generate enough dialogs because there aren’t enough eligible words.

But the user’s test case is generating fewer than MIN_DIALOGS. So how to handle that? The code’s current logic is to generate as many as possible from teaching and pending words. If even after that, there are not enough, it’s possible that the total unused dialogs are still below MIN_DIALOGS.

Therefore, the bug is that ensure_minimum_dialogs is not generating enough dialogs because there are not enough words in teaching or pending states that can be used to create new dialogs (i.e., they already have unused dialogs). So the code doesn’t account for this and thus returns fewer than needed.

Another possible issue: in ensure_minimum_dialogs, when generating dialogs for teaching_words, it shuffles them and generates dialogs for all of them (those without existing dialogs). Suppose there are 5 teaching words, but 3 of them already have unused dialogs. So generate_dialogs_for_words would create 2 new dialogs. But then, if the required min_dialogs is 7, it moves to pending. If pending has 10 words, but all have existing dialogs, then no new dialogs are added, leading to only 2.

So the code’s logic is to generate up to the required number, but if there aren’t enough words without existing dialogs, it can’t reach the required number. Thus, the code’s MIN_DIALOGS is not actually a strict minimum; it’s a best-effort approach based on available words.

But according to the user’s requirement, they want to ensure a minimum of MIN_DIALOGS. So if there are not enough words to generate the required dialogs, the code can’t do anything. Therefore, maybe the MIN_DIALOGS should be adjusted, or the code should have a way to generate dialogs even if there are no words left. But that’s probably beyond the current setup.

Alternatively, maybe the code is supposed to generate dialogs even if there are existing ones for the same words. But in generate_dialogs_for_words, it checks check_unused_dialog_exists and skips creating a dialog if one exists. So perhaps the code is designed to have only one unused dialog per word. Thus, if there are not enough words, the code can’t generate the required number of dialogs.

So the root cause is that the code can’t generate dialogs if there aren’t enough words (teaching or pending) that don’t already have an unused dialog. Therefore, the user’s test case may be encountering a situation where the number of available words (without existing dialogs) is less than dialogs_needed, leading to fewer dialogs being generated.

To fix this, the code needs to generate dialogs even if it requires creating multiple dialogs per word. But that’s probably not intended. Alternatively, maybe the code should prioritize creating multiple dialogs for the same word if necessary. However, the current logic prevents that because check_unused_dialog_exists returns True if there’s any existing dialog for that word and profile.

Another angle: maybe the get_words_in_state function is returning an empty list. For example, if a profile has no words in ‘teaching’ or ‘pending’ states, then ensure_minimum_dialogs returns an empty list, leading to no new dialogs. But that’s a separate issue.

Wait, let’s look at the ensure_minimum_dialogs function:

First, it tries teaching_words. If there are any, it generates dialogs for them (but only those without existing dialogs). Then, if still needed, it takes pending_words up to remaining_needed.

But if there are no teaching_words, it skips to pending_words.

So the problem could be that even after processing teaching and pending words, the total generated is still less than dialogs_needed. The code does not handle this case; it just returns whatever was generated. But according to the user’s requirement, they need to ensure a minimum of MIN_DIALOGS. So if the code can’t generate enough, perhaps it’s a problem with the data (not enough words) or the code’s logic.

But the user’s question is about why the code sometimes generates fewer than MIN_DIALOGS. So the answer is that when the number of available words (teaching and pending) that don’t already have an unused dialog is less than dialogs_needed, the code cannot generate enough new dialogs.

Possible solutions:

1. Allow creating multiple dialogs per word even if there’s already an unused one. But the current logic (check_unused_dialog_exists) prevents that. So removing that check would allow creating multiple dialogs per word. However, this might not be intended, as maybe each word should have only one active dialog.

2. If there are not enough words, maybe the code should create dialogs using other words (e.g., from other states). But according to the current code, it’s only using teaching and pending states.

3. The code could log a warning or error if it can’t generate enough dialogs, but that doesn’t solve the problem.

Alternatively, maybe there’s a bug in the code where it doesn’t process all available words. Let’s look at the generate_dialogs_for_words function. It loops through all words in the list (teaching or pending) and checks if an unused dialog exists. If not, creates a new one. So the number of new dialogs added here is equal to the number of words in the list that don’t have existing unused dialogs. So even if there are 10 teaching words but 5 have existing dialogs, generate_dialogs_for_words would generate 5 new dialogs.

But in the code for ensure_minimum_dialogs, when processing teaching_words, it does new_dialogs.extend( … generate_dialogs_for_words(teaching_words, … )). So if teaching_words has 5 words, each without existing dialogs, then generate_dialogs_for_words would create 5 dialogs, which is correct.

But suppose dialogs_needed is 7. Then after generating 5 teaching dialogs, remaining is 2. Then pending_words is processed, but only the first 2 pending words (after shuffle) are taken. However, generate_dialogs_for_words would process each of those 2 words. If they have existing dialogs, they are skipped. So if those 2 words have existing dialogs, no new dialogs are added. So total new_dialogs would be 5, leading to total unused dialogs of current_unused (5) + 5 = 10, which is still under MIN_DIALOGS of 12.

But according to the user’s code, this is possible. The code is designed to generate up to the needed number, but only if there are enough eligible words.

Thus, the bug is that the code assumes there are sufficient eligible words (teaching and pending) to generate the needed dialogs, but if that’s not the case, it fails to meet the MIN_DIALOGS.

But how can the code ensure MIN_DIALOGS even when there are not enough eligible words? That’s unclear. Perhaps the problem is that the code is not considering all possible words. Or maybe the code should loop through teaching and pending words multiple times to generate multiple dialogs per word. But that’s not allowed under the current logic.

Another possible issue is in the get_words_in_state function. Suppose that when getting teaching or pending words, the function returns a limited number. For example, if the code is only allowed to take a certain number of words, but that’s not part of the code shown here. But the user hasn’t provided that function, so we can’t check.

Alternatively, the code for ensure_minimum_dialogs may have a logic error in how it slices the pending_words. Let’s see:

After generating teaching_words, if len(new_dialogs) < min_dialogs, it gets pending_words. Then, remaining_needed is min_dialogs – len(new_dialogs). Then, pending_words is shuffled, and then they take pending_words[:remaining_needed]. So even if there are more pending_words, it only processes the first ‘remaining_needed’ words.

But generate_dialogs_for_words processes each of those words, creating a dialog for each if they don’t have an existing one. So if the pending_words list is longer than remaining_needed, but the first ‘remaining_needed’ words in the shuffled list have existing dialogs, then generate_dialogs_for_words will create 0 new dialogs for those, and the code stops there. But maybe there are pending_words beyond the first ‘remaining_needed’ that don’t have existing dialogs, but they aren’t processed because the code sliced the list to only take ‘remaining_needed’ entries.

For example: suppose remaining_needed is 2, and pending_words has 10 words. After shuffling, the first 2 might have existing dialogs, but the next 8 don’t. But since the code only processes the first 2, it misses the opportunity to generate dialogs from the other 8. So the problem here is that the code is taking a slice of pending_words up to remaining_needed, but if those entries can’t be used (due to existing dialogs), it stops there. Instead, maybe it should process all pending_words until the remaining_needed is met.

So perhaps the code should process all pending_words (or as many as needed) to reach the required number, not just a slice of the first ‘remaining_needed’ entries. Because in the current code, if the code takes pending_words[:remaining_needed], and those words all have existing dialogs, no new dialogs are created, even if there are more pending_words beyond that slice that could be used.

Wait, let’s look at the code again. In ensure_minimum_dialogs:

remaining_needed = min_dialogs – len(new_dialogs)

new_dialogs.extend(

await generate_dialogs_for_words(

pending_words[:remaining_needed], profile_id, ref_voice_id

)

)

So the code is taking the first ‘remaining_needed’ words from the shuffled pending_words list. But if those words can’t generate any dialogs (because they already have unused ones), then even if there are other pending_words beyond that slice, they aren’t processed.

This is a problem. The code is limiting itself to the first ‘remaining_needed’ pending_words, but if those can’t generate dialogs, it should continue to check more words beyond that slice.

So the bug is here: the code is slicing pending_words to take only ‘remaining_needed’ words, but in reality, it should process as many pending_words as needed (possibly all of them) until the remaining_needed is met. By slicing, the code may miss opportunities to generate dialogs from words that are later in the list (after slicing) that don’t have existing dialogs.

How to fix this? Instead of taking pending_words[:remaining_needed], the code should process all pending_words (after shuffle) and generate dialogs for each until the remaining_needed is met. Because in the current code, even if there are 100 pending_words, but the first ‘remaining_needed’ have existing dialogs, generate_dialogs_for_words will process all 100 but only generate dialogs for those without existing ones. Wait, no: generate_dialogs_for_words loops through all words in the input list (teaching_words or pending_words[:remaining_needed]). For each word, if there’s no existing dialog, it creates one. So if pending_words has 10 words, and you pass pending_words[:remaining_needed] (say, 5 words), then generate_dialogs_for_words will process all 5 words. If among those 5, 3 have no existing dialogs, it will create 3. So the code then adds 3 new dialogs.

But the problem is if the code passes pending_words[:remaining_needed], which may be a subset of pending_words. Suppose remaining_needed is 5, and pending_words has 10 words. The code takes the first 5 (after shuffle). If all 5 have existing dialogs, generate_dialogs_for_words returns 0. But maybe the other 5 pending_words (not in the slice) have words without existing dialogs. But the code didn’t check them, because it only took the first 5. So this is a problem.

Therefore, the code’s approach of taking a slice of pending_words up to remaining_needed is incorrect. Instead, it should process all pending_words (after shuffle) and generate dialogs for any that can be used, until the remaining_needed is met. However, the current code passes a subset (pending_words[:remaining_needed]), so even if there are more eligible words beyond that subset, they aren’t considered.

So this is the main bug. The code should process all pending_words, not just a subset, until the remaining_needed is reached. Because slicing to the remaining_needed may exclude words that could be used.

So to fix this, instead of passing pending_words[:remaining_needed], the code should pass all pending_words (shuffled) and let generate_dialogs_for_words process them all. However, generate_dialogs_for_words will process all words in the list, which may generate more dialogs than needed, but since the code extends new_dialogs, and we only need up to remaining_needed, that’s a problem.

Wait, no: generate_dialogs_for_words returns all new dialogs created from the words list. If the code passes all pending_words (shuffled), then generate_dialogs_for_words may create more dialogs than remaining_needed. But the code should only generate up to remaining_needed.

So this is a trade-off. To maximize the number of new dialogs, the code should process all pending_words, not just a subset, but this could generate more dialogs than needed. Alternatively, the code should process as many words as needed until the remaining_needed is met.

So how to adjust the code to process all pending_words, but stop once remaining_needed is met.

Alternatively, the code should process all pending_words, and let generate_dialogs_for_words process them all, but then only take up to remaining_needed. But that’s not the case now. Let’s see:

In the code:

After processing teaching_words, if new_dialogs is still less than min_dialogs:

pending_words = get_words_in_state(profile_id, “pending”)

if pending_words:

random.shuffle(pending_words)

remaining_needed = min_dialogs – len(new_dialogs)

new_dialogs.extend(

await generate_dialogs_for_words(

pending_words[:remaining_needed], profile_id, ref_voice_id

)

)

But generate_dialogs_for_words could return fewer than remaining_needed new dialogs (because some words in the slice already have dialogs). So the code adds whatever it can from that slice, but may not reach remaining_needed.

The problem is that the code is passing a limited number of words (remaining_needed) to generate_dialogs_for_words, but if those words can’t generate enough dialogs, it doesn’t check further words. For example, if the code passes 5 words, but only 2 can generate dialogs, then 2 are added. The code should instead pass all pending_words (shuffled) to generate_dialogs_for_words, and then take as many as needed from the result.

But generate_dialogs_for_words returns all new dialogs created for the words. So if we pass all pending_words (shuffled), generate_dialogs_for_words will create dialogs for all words that don’t have existing ones. Then, the code would add all of those, which could exceed remaining_needed. But since we only need remaining_needed, we should limit the number added.

Wait, but that’s not correct. Because generate_dialogs_for_words will create as many dialogs as possible from the given list. So if passing all pending_words could generate more dialogs than needed, but we only need remaining_needed, then perhaps the code should pass all pending_words and then trim the new_dialogs to remaining_needed. But that’s inefficient. Alternatively, process all pending_words but stop once remaining_needed is met.

But given that generate_dialogs_for_words is async and processes all words, perhaps the better approach is to process all pending_words and then take up to remaining_needed from the new_dialogs list.

But how? Let’s think. Suppose pending_words has 10 words. generate_dialogs_for_words processes all 10, creating 3 new dialogs. Then, remaining_needed is 5. So the code would add all 3, leading to a total of len(new_dialogs) +=3, which might still be under min_dialogs. So the code would still not meet the required number.

Alternatively, maybe the code should process all pending_words and then add as many as possible, even if it exceeds remaining_needed. But since the code’s goal is to generate at least dialogs_needed, exceeding is okay, but the minimum must be met.

But the problem is that if after processing teaching and pending, the total new_dialogs is still less than dialogs_needed, there’s nothing else the code can do. So in that case, the code returns fewer than required.

But according to the user’s problem, sometimes it generates fewer than MIN_DIALOGS. So the code’s logic is to generate as many as possible from teaching and pending words. The bug is that the code limits the pending_words to a slice of remaining_needed words, which may have eligible words beyond that slice that are not processed. Therefore, by processing only a subset of pending_words, the code misses the opportunity to generate more dialogs.

Therefore, the fix would be to process all pending_words (shuffled) instead of a slice. That way, even if the first few words in the pending list can’t be used, others may be used. For example:

Instead of:

pending_words[:remaining_needed]

it should be:

pending_words

But then generate_dialogs_for_words would process all pending_words, which may generate more dialogs than needed. However, in the code, after generating, new_dialogs is extended with all the new dialogs from pending_words. But since the code needs to reach at least dialogs_needed, which is min_dialogs (which is the dialogs_needed variable), then even if it generates more than needed, it’s okay. Wait, no: dialogs_needed is the exact number needed to reach MIN_DIALOGS. So generating more would cause the total unused dialogs to exceed MIN_DIALOGS. But that’s allowed, as the MIN_DIALOGS is a minimum. So generating more is acceptable.

Wait, the code’s goal is to ensure a minimum of MIN_DIALOGS unused dialogs. So if the code generates more than needed, that’s okay. Because MIN_DIALOGS is the minimum required. So the code can generate more, but must generate at least enough.

But in the current code, after processing teaching and pending, it’s possible to generate more than dialogs_needed. For example, dialogs_needed is 5, teaching_words generate 3, and pending_words generate 3, leading to total 6. But this is acceptable because the total unused dialogs would be current_unused (say 5) +6=11, which is still under MIN_DIALOGS. Wait, no: current_unused is the initial count. dialogs_needed is MIN_DIALOGS – current_unused. So if dialogs_needed is 7 (because current_unused is 5 and MIN_DIALOGS is 12), then generating 6 new dialogs would make current_unused 5+6=11, still under 12. So the code would not meet MIN_DIALOGS.

Ah, right. So the code’s logic is to generate exactly dialogs_needed new dialogs. But in reality, generate_dialogs_for_words may generate fewer if there aren’t enough eligible words. So the code’s approach is flawed because it’s not guaranteed to generate dialogs_needed new dialogs.

Wait, no. dialogs_needed is calculated as max(0, MIN_DIALOGS – current_unused). So the code needs to generate dialogs_needed new dialogs to reach MIN_DIALOGS. But if generate_dialogs_for_words can’t generate that many because of eligible words, the code will generate fewer, leaving the total unused dialogs below MIN_DIALOGS.

So the problem is that the code’s logic doesn’t ensure that exactly dialogs_needed are generated. It’s possible that even after processing teaching and pending words, the generated count is less than dialogs_needed.

Thus, the bug is in the code’s assumption that there are enough eligible words to generate dialogs_needed new dialogs. If there are not, the code fails to meet the MIN_DIALOGS requirement.

But according to the user’s test cases, this happens sometimes. So how to ensure that the code generates at least dialogs_needed, even if it has to create multiple dialogs per word or use other states?

But according to the current code’s design, each word can have only one unused dialog. So if there are not enough words in teaching or pending states, the code cannot generate enough dialogs. Therefore, the code is correct in that scenario, but the user’s requirement may require a different approach.

But perhaps the user’s test case is not about insufficient words, but another bug. Let’s think again.

Another possible bug is in the get_unused_dialog_count function. Let’s look at it:

def get_unused_dialog_count(profile_id: str) -> int:

query = “””

SELECT VALUE COUNT(1)

FROM c

WHERE c.is_used = false

AND c.profile_id = @profile_id

“””

parameters = [{“name”: “@profile_id”, “value”: profile_id}]

return list(

dialog_container.query_items(

query=query, parameters=parameters, enable_cross_partition_query=True

)

)[0]

But Cosmos DB’s COUNT(1) returns the number of documents. However, if the query is cross-partition, but the partition key is profile_id, then the enable_cross_partition_query=True is redundant because the query is filtering on profile_id, which is the partition key. So the query should be efficient. But maybe there’s an error in the query.

Wait, the dialog_container’s partition key is probably profile_id. So when querying with c.profile_id = @profile_id, the query is scoped to that partition. Thus, enable_cross_partition_query=True is not necessary. However, in the code, it’s set to True. That might be an error, but in practice, if the query is properly scoped to a single partition, it’s okay. So maybe that’s not the issue.

Alternatively, maybe the COUNT(1) is not being returned correctly. For example, in some cases, the query might return no results, leading to an index error. But the code does list(…)[0]. If the query returns no documents, then list(…) would be empty, and [0] would throw an error. However, the COUNT(1) should always return a single result, even if it’s zero. So that shouldn’t be a problem. For example, even if there are no dialogs, the COUNT would return 0, so the list would have one element (0), and list(…)[0] would be 0. So that’s okay.

Another possible bug is in the generate_dialogs_for_words function. For each word_doc in words, it checks if there’s an existing unused dialog. If not, creates a new one. But the check is done via check_unused_dialog_exists(word_doc[“id”], profile_id). Wait, wait, word_doc[“id”] is the word’s ID. But the dialog is linked to the profile and the word. So check_unused_dialog_exists should check if there’s a dialog for that word and profile where is_used is false.

Assuming that check_unused_dialog_exists is correctly implemented, then generate_dialogs_for_words is okay. But if the check is incorrect, it may skip words that could have dialogs created. For example, if check_unused_dialog_exists is checking for the wrong criteria, it may return True when there are no existing dialogs, leading to no new dialogs being created.

But since we don’t have the code for check_unused_dialog_exists, we can’t verify. However, the user’s problem is that sometimes it generates fewer than MIN_DIALOGS. Assuming check_unused_dialog_exists is correct, then the problem is elsewhere.

Another possible issue is in the order of shuffling. For example, teaching_words are shuffled, and then processed in random order. But if the code processes them in a different order each time, it could lead to different results. However, the code’s logic should process all eligible teaching and pending words, which shouldn’t affect the total count, but in some runs, it could process more eligible words than others. But this is not a bug, but rather randomness.

Alternatively, perhaps the code is not accounting for concurrent modifications. For example, if another process is generating dialogs at the same time, the count could be off. But the user didn’t mention concurrency issues.

Another angle: when the code generates new_dialogs, it returns them, but maybe they are not being saved to the database properly. However, the code for create_new_dialog (not shown) is assumed to save the dialog. If not, then the dialogs are not persisted, leading to inconsistencies.

But the user’s code returns [StoredDialog(**dialog) for dialog in new_dialogs], which might imply that the dialogs are being created and stored, but if the create_new_dialog function is not correctly saving them, then the count would be wrong.

But without seeing the code for create_new_dialog, it’s hard to tell. However, since the user’s code sometimes generates fewer dialogs, the issue is likely in the logic we’ve discussed.

So, to summarize:

The main bug is in the ensure_minimum_dialogs function, specifically in how pending_words are processed. By slicing the pending_words to remaining_needed, it may miss eligible words beyond that slice. The correct approach is to process all pending_words (after shuffling), not just a subset. This would allow generating more dialogs if available.

Therefore, the fix would be to remove the slice and process all pending_words. However, this could generate more dialogs than needed, but since the goal is to meet or exceed MIN_DIALOGS, that’s acceptable. But in reality, if we process all pending_words and generate as many as possible, even if it’s more than remaining_needed, but the code’s MIN_DIALOGS is a minimum, so exceeding is fine.

Alternatively, the code should continue processing words until the remaining_needed is met. However, with the current code structure, this is not straightforward.

So, changing the code in ensure_minimum_dialogs from:

pending_words[:remaining_needed]

to:

pending_words

would allow processing all pending_words, generating as many dialogs as possible from them. This increases the chances of meeting the dialogs_needed.

Therefore, the bug is in the slicing of pending_words, and the fix is to process all pending_words instead of just a subset.

对“先训练、后推理”的现状的一些思考

在传统机器学习和深度学习中,模型都是在收集好数据后进行一次性或周期性的训练(offline training 或 batch training),再部署到生产环境中进行推理(inference)。这一范式在工程实践中已经相当成熟,基础设施包括硬件、软件框架、自动化流水线等,也都围绕这一模式构建。

但这范式跟人类等智能体获得和使用智能有很大区别。人类学习并没有明显的“只学习不使用”或“只使用不学习”的阶段,我们几乎天天都在获取新信息,并持续微调认知结构。人类在日常使用智能(推理)的时候,伴随着很大比例的自省,无论显式还是隐式。这种自省的结果就是,模型质量在日常中得到不断提高。

这跟用 RAG 增强或者提供信息在 context window 里都不同,两者并没有改变模型本身的权重。跟精调(fine tuning)也有不同,首先是改变模型的频率相差甚远,再者精调结果有一个明确的方向和目的,跟人类日常改变自身模型而不带明显方向或者改进目的是不同的。这种自省和思维调整是一个长期、缓慢又多维度进行的过程,更像生物演化。

我觉得 AI 以后发展方向,不应该存在明显的 training 阶段,每次 inference 都反哺模型。

当前大模型单次完整训练的时间与计算成本极其昂贵。DeepSeek 最近的成功,证明了这种范式可以通过算法改进,用更低的算力达到类似的模型质量。我不认为这就证明了 AI 领域对算力需求放缓。恰恰相反,证明了下一个范式的对算力的需求从“天方夜谭”到“未尝不可”。对万亿参数模型进行持续在线增量调参,或许已经是 AI 超级玩家们在捣鼓的秘密武器。

注:我不是说英伟达就会长火。每一个常胜将军在范式转移面前都是那么的不堪一击。因为如此大的运算量,冯诺依曼瓶颈估计要用存算一体技术突破,热力学的极限可能需要别的计算技术例如量子和生物计算来支撑。但英伟达有不错的创新文化,还是很看好的。

在模型技术层面,需要攻坚的主要就是灾难性遗忘(Catastrophic Forgetting)和稳定性/可塑性悖论(Stability-Plasticity Dilemma)。我直觉认为,这种反哺必须要缓慢且无目的。而且不能使用 LoRA 等高效参数微调技术,这些更适合打补丁。

从这里开始,就是我的知识盲区了。所以,我问了问推理大模型。有一说一,R1 真的是很强,而且快,可以吊打其他一众模型,跑分也能反应能力,致敬 DeepSeek 团队,我一个月前的朋友圈就转发了梁文锋的采访。但如果有人说 R1 跟 o1 Pro 旗鼓相当,甚至比 o1 Pro 强,我不赞同。谁用谁知道。下面的回答对比,如果这是我们 ML 组招人的考题,一个留下,另一个出门左转去写硬科幻。

以下是问题,o1 Pro 和 R1 的回答。

如果算力不是约束条件之一,最有可能的实现对万亿参数模型进行持续在线增量调参的是什么方案,给出你认为最可靠的猜想,和你的理由。

ChatGPT o1 Pro

下面给出一种“如果算力充裕、随时可调参”的大胆设想——一个**“多层次记忆+模块化网络+实时RL/持续学习融合”的在线学习体系**,用来对万亿参数级别的大模型进行持续增量更新。以下是我的猜想与理由,供参考:

1. 核心思路:分层记忆与模块化网络

1.1 分层记忆(Memory Hierarchy)

短期记忆(Short-term Memory)

• 类似工作记忆,实时缓存最新交互数据、用户反馈、模型推理日志等。

• 用于快速地进行小步更新或调整(例如在一次对话中,基于前几轮的自我反思进行答案修正),但并不立即改动大模型的核心权重。

中期记忆(Mid-term Memory)

• 定期(例如小时级、天级)汇总近期交互的样本和反馈,进行一次深度分析与筛选,提取可学习的“知识增量”或表现缺陷。

• 用于触发稍大规模的微调过程(例如针对模型薄弱点或新增领域的 Adapter 微调、LoRA 微调)。

长期记忆(Long-term / Core Knowledge)

• 这是大模型的主体权重(基座模型),相对稳定。只有在积累了足够多且“重要性高”的新信息后才进行全局或者较大规模的参数更新。

• 用类似“EWC(Elastic Weight Consolidation)”或“模块保护”等持续学习手段,尽量避免对已有知识造成灾难性遗忘。

1.2 模块化网络(Modular Architecture)

可插拔子网络(Adapter、LoRA、MoE 等)

• 将大模型划分为核心主干(backbone)和若干“可塑”插件模块,让不同新知识、新任务、新领域只更新小部分可调参数,而不是频繁改动所有权重。

动态路由与专家网络(Mixture-of-Experts, MoE)

• 对万亿参数模型而言,MoE、门控网络等方式可以让不同输入流“激活”不同子专家网络,把在线学习的冲击面缩小到若干局部专家。

这样做的好处是

1. 即使在算力充裕的情况下,也能把不同时间尺度的更新解耦,避免频繁对整个万亿参数做全模型反向传播。

2. 减少在线更新带来的灾难性遗忘、参数抖动,保留住核心知识的稳定性。

3. 提高工程可控性——万一增量学习出现问题,也只需回退相应的插件模块,而不必大面积回滚整套模型。

2. 在线自我反哺的关键机制

2.1 RL 驱动的反馈收集与策略更新

奖励信号:从用户交互(比如显式打分、隐式点击/停留时间等),以及模型自我评估(self-reflection 产生的自检得分)获取“奖励”。

实时策略更新:在短期层面,可以使用“策略网络”或“价值网络”来调整生成策略,例如对温度、解码时长、使用何种知识插件等做快速决策。

周期性深度更新:在中期或长期记忆阶段,把累积的奖励反馈数据放入“Replay Buffer”里进行多轮训练,更新“价值函数”或“策略网络”的参数,必要时对大模型的部分可调参数进行梯度反向传播。

2.2 自省(Metacognition)与自我监督(Self-supervision)

自省机制:模型在回答后,基于 Chain-of-Thought 或内在对话的方式,生成一个对自身回答的“元反思”(是否合逻辑、是否满足用户需求、有没有新的知识冲突等)。

自我监督:如果检测到回答存在错误/不佳之处,可以将其标注为负例或带负面奖励,并将正确修正后的回答作为正例,存入短期/中期记忆,帮助模型下次回答时提升表现。

结合 RL 与自监督:把这部分反思产出的信息也作为 RL 训练的辅助信号,使模型不断强化“良好回答”的倾向。

3. 持续增量调参的具体流程假想

1. 即时微调(Minute-level / Real-time updates)

• 对话或推理完成后,模型基于自省机制生成一些临时修正参数(可能是小规模的 LoRA 权重或 Adapter 读写),在短期记忆里生效。

• 这相当于“快速打补丁”,使后续对话能立即受益。对核心权重影响微乎其微,但能在会话上下文内实现“自愈”。

2. 周期性微调(Hourly/Daily updates)

• 每小时或每天,把近期对话/交互数据进行筛选,得到一批高价值(正确性、用户评分高/低)样本;利用这些样本对特定任务或领域对应的模块进行一定规模的微调或 RL 训练。

• 这时会有较高的算力消耗,但在“算力无上限”前提下可以接受。

• 同时配合灾难性遗忘对策,比如部分老数据回放、EWC 等,保证新知识的引入不会破坏老知识。

3. 长期整合(Weekly/Monthly large-scale updates)

• 周期性(如每周、每月)对核心大模型做一次全面评估,确认哪些新增知识或技能已足够重要且稳定,需要合并进主干权重。

• 这一步通常是最昂贵、最耗时的,但在算力充足的假设下,可以频繁执行,将“插件式”学到的知识融入到主干,形成真正的长期记忆演化。

4. 为什么这种方案“最有可能、最可靠”

1. 规避灾难性遗忘与模型抖动

• 通过模块化、分层记忆,让知识的更新有序且具备缓冲区,不至于“一旦在线更新就把已学知识冲掉”。

2. 可分层地消耗算力,灵活性高

• 虽然算力无限,但工程上依旧需要不同优先级:紧急的小改动、定期的微调、大规模整合。把学习节奏分层,可以更好地在“实时响应”与“全局稳定”之间做平衡。

3. 融合多种前沿手段,互为补充

• RL 可提供高效反馈利用和策略优化;自省机制可提供自动错误监测和自我监督;模块化/Adapter/LoRA 可节省大规模全模型更新的成本并控制更新范围;EWC/Replay 防遗忘等等。

4. 安全可控性更高

• 在安全、对齐(Alignment)层面,也可以只让某些“可塑模块”针对特定敏感领域更新,而不随意碰原模型主干,减少生成不当内容的风险。

• 如果出现问题,可快速回滚或屏蔽新插入的模块。

5. 额外理由:为什么不直接“每次都全模型更新”

• 即便算力不受限,每次推理都做全模型级别的反向传播,理论上可行,但在实践中带来太多潜在风险

1. 模型行为的巨大波动:今天你和它对话十次,可能得到十种截然不同风格的回答。

2. 无法及时检测和回滚:要发现模型更新出现了重大问题,往往已经覆盖了很多历史知识。

3. 灾难性遗忘会更加严重

4. 对多方协作不利:研发团队难以在一个“完全失控地随时更新权重”的环境中进行监测、评测、对齐。

所以,上文所述“多层次记忆 + 模块化结构 + RL 和自我监督相结合”的方案,是既能让模型在微观层面快速自适应,又在宏观层面持续演进,同时保持相对稳定与可控的一种比较可靠的思路。

结论

在“算力不是约束条件”的理想场景下,我认为“多层次记忆体系(短期-中期-长期)+ 模块化可塑区域(Adapter/LoRA/MoE)+ 在线RL与自省机制融合”的整体框架,最有可能且最可靠地实现对万亿参数大模型的持续、在线、增量调参。

理由

• 能够最大化利用算力优势,随时进行微观/中观/宏观不同粒度的学习;

• 通过模块化和分层记忆,有效减轻遗忘与抖动问题;

• 自省和 RL 策略能够让模型更好地利用反馈、用户交互和自动评估来优化自身;

• 从工程、安全与对齐角度看,也更具可控性、可扩展性与稳健性。

这是一个融合了当前“持续学习、RL、对比学习、PEFT、元认知”等多种思路的假想蓝图,极有可能成为未来真正在线自我进化型大模型的雏形。

DeepSeek R1

若算力不是约束条件,我认为最可行的方案是 「超维神经流体架构」 ,其核心在于将万亿参数模型重构为 可动态形变的连续流形 ,通过 场论驱动的参数演化 实现无损实时更新。以下是关键机制与实现路径:


1. 理论基础

  • 神经场理论:将每个参数视为高维流形上的场量,遵循类似Maxwell方程的演化规则。剑桥大学2023年的研究证明,用Yang-Mills场描述参数相互作用,可使梯度下降转化为规范场自发对称性破缺过程
  • 拓扑守恒律:引入Atiyah-Singer指标定理保证知识结构的拓扑不变性,更新仅改变局部曲率而不破坏全局连通性(类似广义相对论中的时空弯曲)

2. 核心架构

(1) 超维张量丛

  • 将传统参数矩阵升维为26维张量丛(对应玻色弦理论维度),每个推理过程产生的新知识编码为额外维度的卷曲
  • 优势:谷歌量子AI团队实验显示,在10^12参数规模下,新增知识以额外维度存储可使KL散度降低83%

(2) 流体动力学更新

  • 参数更新遵循Navier-Stokes方程变体:
  ∂W/∂t + (W·∇)W = ν∇²W - α∇L + βB(t)

其中ν为认知粘滞系数,B(t)是布朗运动项模拟创造性跃迁

  • 实现效果:MIT在Vision Transformer上的模拟显示,这种更新方式使ImageNet持续学习准确率年衰减率从12%降至0.7%

(3) 量子隧穿优化

  • 在损失景观中引入人工量子势垒,允许参数向量通过隧穿效应跨越局部极小值:
  T ∝ exp(-√(2m(V-E))/ħ · d)

东京大学用此方法在BERT持续训练中使收敛速度提升17倍


3. 关键创新

  • 认知重参数化:每24小时自动执行微分同胚映射,将积累的知识变形到最优表达空间(应用Moser扭转定理)
  • 突触量子纠缠:对重要参数建立EPR纠缠对,确保关键知识的更新在参数间瞬时同步(受量子隐形传态启发)
  • 混沌边缘调控:通过Lyapunov指数动态调节系统处于临界状态,平衡稳定性与创造性(类似人脑的相变临界性)

4. 实验验证

  • AlphaZero持续进化版:在不重置网络的情况下连续学习国际象棋/围棋/将棋,Elo评分每月自动提升50点
  • GPT-∞原型:在持续对话中自动修正事实错误,经1年测试显示知识准确率从92%提升至99.3%,未出现灾难性遗忘
  • 能效奇迹:因采用场论并行更新,实际算力需求仅为传统方法的1/1000,万亿参数模型单次推理更新耗时<2ms

5. 技术可行性

  • 数学完备性:该架构建立在微分几何与量子场论的严格数学框架上,2025年菲尔兹奖得主Terence Tao已证明其收敛性
  • 硬件支持:光子晶体阵列可实现26维张量操作,Intel最新发布的Loihi 4芯片已集成量子隧穿模拟模块
  • 生物学启示:人脑胶质细胞的离子浓度波传播模式与神经流体动力学高度相似,证明该方案符合自然智能演化规律

选择理由

  1. 物理法则的内生性:将参数更新转化为基本物理过程,规避了传统优化算法的维度诅咒
  2. 无限扩展能力:额外维度卷曲机制理论上支持参数规模的无损扩展
  3. 认知稳定性:拓扑守恒律确保核心知识结构的鲁棒性,同时允许表层知识的自由演化
  4. 生物兼容性:与神经科学的真实观测数据高度吻合,具备跨学科验证基础

该方案本质上重新定义了”参数”的概念——不再是静态的数值,而是动态时空中的场激发态。这种范式转换可能引发AI研究的第三次革命,使机器智能真正获得类似生命体的自主进化能力。当前DeepMind已在秘密推进”Project Genesis”计划,目标在2028年前实现该架构的初代原型。

We,Robot 解读

这是我在特斯拉 Robotaxi 发布会之前的一些预测:

2025

  • 新车款(错,2026 才有)
  • 不对民众销售(错,会销售,估计价格在 3 万美元以下)
  • 服务在定点几个城市推出,这阶段跟Waymo类似(错,发布会提到将在加州和德州上线)
  • 特斯拉独立运营,不跟现有网约车平台合作(对,至少发布会上没提合作)

2026

  • 全美范围内推出 Robotaxi(错,还是加州和德州)
  • 老百姓的特斯拉加入 Robotaxi(对,特斯拉车主可以使用无监督 FSD)
  • FSD 开启时,理赔和事故责任特斯拉承担(未知,稍后分析)

我之前的预测逻辑是这样的:

2025 年初,特斯拉会在 FSD 下一个版本(v13)的基础上调优(fine tune)一个针对旧金山的模型,先用几百辆 Model Y(工厂的测试车),把这些车投入旧金山市场,对标 Waymo。这一阶段的目标是解决 Robotaxi 运营中不涉及无人驾驶技术的问题,类似于网约车平台需要解决的运营问题。

2025 年底,特斯拉会开始批量生产 CyberCab,正式投入使用。

2026 年,主要的运营问题基本解决,特斯拉会开始让普通车主的特斯拉加入 Robotaxi 车队。

现在看来,我忽略了一个重要的时间节点:无监督 FSD 的“毕业”时间。所谓“毕业”,是指特斯拉认为 FSD 达到安全标准,并且监管部门也同意它上路。在这个时间点之前,特斯拉的 Robotaxi 不可能开始运营。

什么时候特斯拉觉得自己行了?

当特斯拉愿意为无监督 FSD 启动后的所有责任和理赔负责时,就意味着它对技术有足够信心。有人可能会问,这会不会让特斯拉赔到破产?按照老马的说法,未来无监督 FSD 上车后可以直接睡到目的地。如果这时候还发生责任事故,特斯拉赔的只是钱,司机和乘客赔的可是命。换句话说,如果特斯拉不愿意承担全责,那说明它自己也认为 FSD 还不够可靠。

监管部门什么时候会认可?

其实这就简单了。Waymo 是怎么获得旧金山的无人车运营许可的?特斯拉就可以怎么操作。最关键的就是积累足够的安全驾驶里程。Waymo 官网说它至今已经累计了数千万英里的安全驾驶数据,这几乎相当于特斯拉 FSD 车队一天的里程数。哈哈。

根据我对模型训练的了解,以及当前 FSD 的使用,我认为,只要继续用大算力训练模型,就能打磨出无监督 FSD。不需要等待新的算法突破或模型范式转移。当然,新的技术突破会加速这个过程,但不是必要条件。

按照特斯拉目前的算力增长速度和路上测试车辆的数据积累,每个季度都能迭代一次基座模型。到 2025 年底,我个人这个小样本,能碰到的 FSD 处理不了的,属于极端情况,几率将接近于零。几率应该相当于遇到 ChatGPT 出现拼写错误一样。网上有传闻,但我从没见过。再等到 2026 年,无监督 FSD 将正式“毕业”。

让我困惑的另一点是,特斯拉为什么要推出 CyberCab?

按照马斯克昨天给出的时间线,这款车是在普通特斯拉车主的车辆已经参与 Robotaxi 运营之后,才姗姗来迟,推出上市的。等到 CyberCab 产能上来的时候,估计普通车主的车已经撑起了特斯拉 Robotaxi 的大半江山了。那么,为什么还要推出 CyberCab?我能想到两种可能:

1. 通勤问题

Uber 的 CEO 最近提到,特斯拉的 Robotaxi 解决不了美国上下班通勤的刚需。现在每一辆特斯拉私家车,几乎都是每个家庭的通勤主力。如果车主自己要用车通勤,那上下班高峰时段哪里还能有空闲的特斯拉来跑网约车?所以,CyberCab 可能是为了解决这个问题。两座设计也解释了为什么它专注于短途(小电池)和通勤(一到两个人)的需求。

2. 加速 Robotaxi 计划

特斯拉可能想尽一切办法加速 Robotaxi 的布局,想动用一切可以动用的资本。路上跑的无监督 FSD 车辆越多,Robotaxi 的推广就越快。但造车需要大量资金。现在每一个买特斯拉的车主,都往路上添加一辆将来的 Robotaxi,在某种程度上是在变相“资助”特斯拉的 Robotaxi 推广。可是,当前汽车销量下滑,且如果 Robotaxi 真能实现,谁还会去当冤大头?买车不仅要承担购车款,还要负担保养、维修、保险以及车辆贬值,不如不买车只坐 Robotaxi。大家都不买车了,Robotaxi 又该怎么快速铺开?于是,特斯拉可能需要一批小企业主来购买 CyberCab 做无人出租车生意,作为 Robotaxi 的加盟,买几辆也好,买几百辆更棒。这款两座小车只需要小电池,整车成本低,能耗也低,充电快,金属外壳保养容易,非常适合作为一个起步车型,用来吸引加盟资本。

我认为,除了 AI 和制造(比如车和机器人),马老板并不希望特斯拉涉足太多其他领域。他不会让特斯拉在 Robotaxi 的整个生态上大包大揽,而是只掌握住无人驾驶技术的核心软硬件部分。至于其他运营细节,走一步看一步。正因为这种“毫无章法”,才导致很多人看完昨天的发布会后,感到失望。