2025-05-24

april fools and eggman

May 24, 2025

Dwarkesh interviews Sholto and Trenton again, coinciding with the release of Claude 4. This interview is Near approved, and for what it’s worth, I don’t really have much I disagree on either. Particularly interesting are the sections on continual learning and scaffolding, as well as the meaning of neuralese1. Also, there’s an interesting tidbit on Claude imitating what it reads about itself on the internet, which seems to be evidence that the AI Thucydides Trap is a real failure mode.

Hacker News has an interesting thread discussing John Carmack’s presentation on building physical agents. There’s also an interesting comment from Alex Nichol of OpenAI saying that this approach has already been tried.

Charlie Guo has a good roundup of all the AI conferences and announcements this week.

Scott Alexander on one’s first moments of conscious awareness. What I take away from this is that apart from defining consciousness, even the idea of human consciousness is underdefined. Jenneral lists several different definitions, including emotional awareness, insight, consideration of others, self-reflection, and agency. Presumably others might add things like ego-death or enlightenment.

Dynomight has a proposal that NumPy should add some syntactic sugar that converts loops into array operations.

Asimov Press has an interesting post about antibiotic resistance in soil-dwelling bacterial as the source of antibiotic resistance in pathogens. Makes sense given that most of our antibiotics are derived from other soil-dwelling microbes and fungi. Makes me somewhat hopeful that completely artificially derived antibiotics could be robust against resistance.

Noah Smith in praise of small businesses. I’ve never really understood the argument that they are economically inefficient, because aside from their positive externalities, large businesses presumably have to start off as small ones. The focus should be on moving things up the ladder, not manifesting corporations out of nowhere.

Snowden Todd on the early modern history of Syria, which is pretty illuminating as to why Turkey is so involved in the area, even compared to other former-Ottoman states.

Stephanie Murray natalism linkthread, including apparent confirmation that happiness and optimism are positively correlated with fertility rates.

After thinking about it some more, I’m less certain about this idea that AI architectures are wordcels focused on nodes and what’s required are shape-rotators that care about edges. Because if you look at things in terms of this metaphor, transformers with their multiple attention heads are already operating on the relationships between all of their input tokens. However, I still think that AI is already very creative, and can be made more so by increasing the temperature of diffusion, so the limiting factor for AI creativity is not novelty but actually correctness and adherence to fitting within some larger context.

Embedding space is necessarily high-dimensional in order to be sufficiently general, but for any particular task there are specific dimensions which a user will specifically want their output to align on. Similarly, the vectors for different tokens exist in a position designed to minimize error over all of its inputs, but their positions may be slightly adjusted, or even entirely different for your specific use case. To a large extent, I think things like scaffolding, chain of thought, prompt engineering, and retrieval augmentation are all attempts to create and maintain a target desired context. But the problem with this is that any context is by definition tailored to some particular situation, and so there is no one-size fits all worldview that works for all situations2.

I feel like the path forward should be to make context a first-class citizen, by making it an input that is loaded separately from the user prompt to be operated on by the transformer, similar to how MSA is loaded along with the amino acid sequence in AlphaFold3. During training, you would start with RAG vector embeddings and train the model to provide different outputs depending on which worldview is loaded. Then you do something to update the worldview like backpropogation, and then iterate. This goal is to teach the model how to effectively make use of worldviews while also producing them.

There are lots of interesting implications if something like this ends up being the case. For example, it means high-quality inference will be very expensive in terms of memory and compute. In an inference-bound paradigm, AI labs might be commoditized due to all the labs having more or less the same model weights, and inference would also be run through commodity compute providers. With regard to worldview scaffolds, there would probably be a long tail, with most users loading from standardized The Matrix-style training programs, but many others creating their own customized worldviews for their specific use-cases4.

I think this also applies to RL fine-tuning, which effectively hard-codes a worldview scaffold directly into the model. Possibly all the problems with RL finetuning come about because we have many different worldviews we expect to use at different times or for different applications.

Trying to figure out if I just reinvented a really expensive version of RAG or if this is actually something else. I think this is different, because by actually loading this into the model itself, you are getting the equivalent of both RAG and a custom fine-tuning together, rather than just one, the other, or both in isolation. The other question is whether this would be worth the enormous compute required, particularly for large and complicated worldviews. Since this idea is probably pretty obvious, presumably others have thought about it, so the fact it isn’t used means it currently is not worth it. Yet AlphaFold3 continues to use MSA, so apparently in some circumstances something like this is worth the cost. And while compute continues to go up, spending it on increasing model size seems to have stopped doing much, so maybe we will see something like it in the future.

Edit: Yeah, turns out I just reinvented LoRA.

Lots of interesting implications for alignment and mechanistic interpretability here. Being smaller than frontier model embeddings and being designed to be coherent, these worldviews should be relatively tractable for mechanistic interpretability. Your compute provider could then easily confirm compatibility with safety and morality before making use of them. Likewise, while these worldviews would initially be generated by backpropagation, once you have enough examples, it’s possible models could directly generate a usable worldview scaffold from a brief written summary. Then you could actually have something like Eliezer’s envelope containing all of human morality.

Daily Links

Discussion about this post