Publications

Agents of Chaos

Natalie Shapira, Giordano Rogers, David Bau, et al.

[Under Peer Review]

LLMs Process Lists With General Filter Heads

Arnab Sen Sharma, Giordano Rogers, Natalie Shapira, David Bau

ICLR

Do Natural Language Descriptions of Model Activations Convey Privileged Information?

Millicent Li, Alberto Mario Ceballos Arroyo, Giordano Rogers, Naomi Saphra, Byron C Wallace