CAIS Researchers Discover AI's Preferences — What David Shapiro MISSED in this bombshell paper


Episode Artwork
1.0x
0% played 00:00 00:00
Feb 20 2025 108 mins   18

The Center for AI Safety just dropped a fascinating paper — they discovered that today’s AIs like GPT-4 and Claude have preferences! As in, coherent utility functions. We knew this was inevitable, but we didn’t know it was already happening.

This episode has two parts:

In Part I (48 minutes), I react to David Shapiro’s coverage of the paper and push back on many of his points.

In Part II (60 minutes), I explain the paper myself.

00:00 Episode Introduction

05:25 PART I: REACTING TO DAVID SHAPIRO

10:06 Critique of David Shapiro's Analysis

19:19 Reproducing the Experiment

35:50 David's Definition of Coherence

37:14 Does AI have “Temporal Urgency”?

40:32 Universal Values and AI Alignment

49:13 PART II: EXPLAINING THE PAPER

51:37 How The Experiment Works

01:11:33 Instrumental Values and Coherence in AI

01:13:04 Exchange Rates and AI Biases

01:17:10 Temporal Discounting in AI Models

01:19:55 Power Seeking, Fitness Maximization, and Corrigibility

01:20:20 Utility Control and Bias Mitigation

01:21:17 Implicit Association Test

01:28:01 Emailing with the Paper’s Authors

01:43:23 My Takeaway

Show Notes

David’s source video: https://www.youtube.com/watch?v=XGu6ejtRz-0

The research paper: http://emergent-values.ai

Watch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligence

PauseAI, the volunteer organization I’m part of: https://pauseai.info

Join the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!

Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.

Support the mission by subscribing to my Substack at

https://doomdebates.com

and to https://youtube.com/@DoomDebates



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit lironshapira.substack.com