Submissions from transformer-circuits.pub

		Natural Language Autoencoders Produce Unsupervised Explanations LLM Activation (transformer-circuits.pub)
		4 points by Anon84 12 days ago \| past \| discuss
		NL Autoencoders Produce Unsupervised Explanations of LLM Activations (transformer-circuits.pub)
		3 points by rajeevn 16 days ago \| past
		HeadVis: An Interactive Tool for Investigating Attention Heads (transformer-circuits.pub)
		3 points by MrOrelliOReilly 18 days ago \| past
		HeadVis: An Interactive Tool for Investigating Attention Heads (transformer-circuits.pub)
		4 points by rajeevn 19 days ago \| past
		Emotion Concepts and Their Function in a Large Language Model (transformer-circuits.pub)
		63 points by Anon84 49 days ago \| past \| 10 comments
		Emotion Concepts and Their Function in a Large Language Model (transformer-circuits.pub)
		6 points by majkinetor 50 days ago \| past
		Emotion Concepts and Their Function in a Large Language Model (transformer-circuits.pub)
		3 points by stared 51 days ago \| past
		Anthropic's Interpretability Research Blog (transformer-circuits.pub)
		3 points by philipfweiss 4 months ago \| past \| 1 comment
		Emergent introspective awareness in large language models (transformer-circuits.pub)
		1 point by lawrenceyan 6 months ago \| past
		Emergent Introspective Awareness in Large Language Models (transformer-circuits.pub)
		30 points by famouswaffles 6 months ago \| past \| 4 comments
		When models manipulate manifolds: The geometry of a counting task (transformer-circuits.pub)
		98 points by vinhnx 6 months ago \| past \| 17 comments
		When Models Manipulate Manifolds: The Geometry of a Counting Task (transformer-circuits.pub)
		4 points by 1wheel 6 months ago \| past
		Visual Features Across Modalities: SVG and ASCII Art Cross-Modal Understanding (transformer-circuits.pub)
		12 points by vismit2000 7 months ago \| past \| 1 comment
		LLMs extract high-level semantic concepts from SVG and ASCII art (transformer-circuits.pub)
		3 points by neuronerd1 7 months ago \| past \| 1 comment
		When Models Manipulate Manifolds: The Geometry of a Counting Task (transformer-circuits.pub)
		2 points by tanelpoder 7 months ago \| past
		When Models Manipulate Manifolds: The Geometry of a Counting Task (transformer-circuits.pub)
		5 points by e_ameisen 7 months ago \| past
		Transformer Circuits: reverse-engineering transformers into graspable programs (transformer-circuits.pub)
		1 point by dvrp 10 months ago \| past
		So You Want to Work in Mechanistic Interpretability? (transformer-circuits.pub)
		2 points by jxmorris12 11 months ago \| past
		Circuit Tracing: Revealing Computational Graphs in Language Models (Anthropic) (transformer-circuits.pub)
		173 points by ydnyshhh on March 31, 2025 \| past \| 27 comments
		The Biology of a Large Language Model (transformer-circuits.pub)
		117 points by frozenseven on March 28, 2025 \| past \| 19 comments
		Circuit Tracing: Revealing Computational Graphs in Language Models (transformer-circuits.pub)
		8 points by mfiguiere on March 27, 2025 \| past
		The Biology of a Large Language Model (transformer-circuits.pub)
		3 points by mfiguiere on March 27, 2025 \| past
		Insights on Cross-Coder Model Diffing (transformer-circuits.pub)
		1 point by gregorymichael on Feb 24, 2025 \| past
		Transformer Circuits Thread (transformer-circuits.pub)
		1 point by fzliu on Feb 5, 2025 \| past
		Definitions and Motivation: Features, Directions, and Superposition (transformer-circuits.pub)
		4 points by Bluestein on Dec 27, 2024 \| past
		Toy Models of Superposition (2022) (transformer-circuits.pub)
		45 points by tessierashpool9 on Nov 8, 2024 \| past
		Transformer Circuits Thread (transformer-circuits.pub)
		2 points by plurby on Nov 3, 2024 \| past
		Sparse Crosscoders for Cross-Layer Features and Model Diffing (transformer-circuits.pub)
		2 points by benocodes on Oct 25, 2024 \| past
		A collection of small updates from the Anthropic Interpretability team (transformer-circuits.pub)
		2 points by daralthus on July 31, 2024 \| past
		Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (transformer-circuits.pub)
		22 points by Anon84 on May 23, 2024 \| past \| 1 comment
		More