Publications

Agents of Chaos
Natalie Shapira, Giordano Rogers, David Bau, et al.
[Under Peer Review]
LLMs Process Lists With General Filter Heads
Arnab Sen Sharma, Giordano Rogers, Natalie Shapira, David Bau
ICLR
Do Natural Language Descriptions of Model Activations Convey Privileged Information?
Millicent Li, Alberto Mario Ceballos Arroyo, Giordano Rogers, Naomi Saphra, Byron C Wallace