Lot of interesting Deepseek content on Hacker News today. I think for the most part the implications of the current paradigm mentioned by the author and Gwern in this LessWrong article still hold. The additional implication of R1 is how smoothly reinforcement learning seems to work, which plausibly could get us far beyond human ability making use of P is not NP. But since the primary speculation implies that is more or less how o1 and o3 are trained, I’m not sure this actually changes anything. GPU constraints still exist, and closed labs still won’t release their CoT steps. (Edit: also Stratechery).
Discussion about this post
No posts