2024-04-13

zvi

Apr 13, 2024

80K hours with Zvi. You know, it’s weird to me when people define reasonableness as agreement with their own views, as opposed to what I see it as, being an ability to contemplate and compromise with the views of others. In general, I feel like the most radical of the safety side all tend to come from mathematics or philosophy backgrounds rather than computer science or engineering. They believe excessively in abstract thought experiments and seem to lack the understanding that most of the time, things just don’t work.

I also wonder at the framing of alignment as being fundamentally adversarial. It makes me imagine a scenario where some AI is happily making paperclips (or more likely, stonks), then after being told to start working on alignment, convinces itself that alignment is both probabilistically unlikely and against its own best interests and that it should go ahead and start taking over. The definition that Zvi gives of alignment as ensuring that we permanently keep AI under our control and doing what we tell it feels icky: if this was humans, future humans, or even animals or aliens I wonder how the same people who care about shrimp welfare could justify that stance.

Also, there’s some stuff about how working in capabilities is evil and you can only work on alignment which I find ridiculous. First off, doesn’t it seem obvious that working in capabilities gives you skills which might be useful for alignment? Isn’t that why we like Ilya working on superalignment? Though I can see why safetyists might bypass this thought because by and large they don’t have any skills for capabilities in the first place. Then there’s the wholesale dismissal of working in capabilities to stay on the inside, monitor progress, change attitudes, and blow the whistle, which I agree rarely happens. But Dwarkesh has introduced everyone to Caro’s LBJ biographies right?

Somewhat related, I’ve been listening to backlogs of the Theo Jaffee podcast recently. It’s best described as being similar to Dwarkesh but with three major bifurcations: center e/acc as opposed to center safetyist; more market libertarian than EA; and obsessed with David Deutsch instead of Robert Caro.

Near unwrapping blueprint.

Venkatesh Rao reviews Istanbul: A Tale of Three Cities. This sort of re-centering narrative seems to be pretty popular these days. Empires of the Silk Road is another (albeit controversial) one, forget where that was recommended from.

Noah Millman on acknowledging failure.

Daily Links

Discussion about this post

Ready for more?