2025-09-14
still day beneath the sun
Dwarkesh interviews Sergey Levine on robotics1, an episode which is much better consumed as video as opposed to audio, not just because of in-person demos and graphs, but also because the gestures they use when describing physical and conceptual phenomena add significantly to the communication. This is in some way the medium matching the message, but I wonder to what extent people who are suited to work with robotics will be more likely to use gestures when speaking, as opposed to pure software engineers. Anyway, of the two major questions left open in this interview, what the best foundation model will look like is beyond my ability to speculate on, but the extent of heterogeneity in robotics is very interesting2. It will be interesting to see exactly how far things split, like for example if there will be a standard octopus versus many different ones. Or even if things end up standardizing on the humanoid, whether the American and Chinese humanoids end up being the same or not. One final thought I have is to what extent being humanoid might lead to more natural alignment, since it means they will experience the world in a manner more similar to the way we do, navigating the world and being occupied with similar tasks to us, compared to other more alien architectures.
Jerry Neumann in Colossus Review (via the Diff) argues that the bulk of downstream value produced by AI will not be captured by AI companies.
Casey Handmer evaluates the evidence for life on Mars and critiques the Mars Sample Return requirements.
Statecraft interviews Dean Ball on his time in the Office of Science and Technology Policy, which overlaps somewhat with his earlier Cognitive Revolution appearance. But there is also a lot of Statecraft specific stuff, including one where they both agree that it is hard to create new ideas while in office, which seems to imply that there should be higher turnover in public office: to have people come in with an agenda already ready, to resign after shepherding them through. It actually seems like this could be a good means to increase the rate of policy reform and prevent stagnation in the public service; it might also good for the individual, as it still provides a boost in one’s public profile while mitigating the downside effect of the low salary. Insofar as there are downsides to regular four year turnovers that are often complained about, there also might be some potential advantages that are not really being taken advantage of.
Whipling on China’s distributed system for creating and disseminating influencer propaganda. I almost called it decentralized by mistake, but of course it is centralized: it’s interesting how their governance, industry, and now media are all coalescing on this similar model which is simultaneously under central control, but also to a large extent autonomous.
Regarding the LLM flywheel, where deployment of the model generates data that can be used to further refine the model, I suspect the reason we haven’t really seen it take off yet is because, while pretraining is expanding knowledge about the world, RL fine-tuning is really more akin to specialization on those already learned concepts. If so, it will be difficult to extract general-purpose gains out of task-specific data, since metaphysics is subjective, and a worldview suited for one task will actually degrade performance in others. One might sort of solve this by making the model way bigger with more context-specific experts, but it’s rather wasteful, exactly like having a bunch of experts in the same room and only letting one speak at a time. A better way to unlock the flywheel seems like giving everyone the capability to perform RL on their own specific tasks. But if so, then neither the US nor China are currently well suited to support this: in China there isn’t sufficient compute for everyone to do their own RL; in the US because every major lab wants their own model to be the one to rule them all. But this might be different in the case of robotics, because within the very high dimensional world of physical reality, many tasks actually will require running a maximally general model. I suspect this is why you’ll often hear out of western labs that the vast majority of robots will end up humanoid (e.g. this interview with Demis Hassabis), since that is a single standard combination of physical form and software architecture suitable for a single maximally capable model. Whereas I suspect in China they will be more likely to explore many different physical architectures, each specialized in one particular task, and taking full advantage of their greater manufacturing capabilities.

