| | Emergent introspective awareness in large language models (transformer-circuits.pub) |
| 1 point by lawrenceyan 37 days ago | past |
|
| | Emergent Introspective Awareness in Large Language Models (transformer-circuits.pub) |
| 30 points by famouswaffles 38 days ago | past | 4 comments |
|
| | When models manipulate manifolds: The geometry of a counting task (transformer-circuits.pub) |
| 98 points by vinhnx 39 days ago | past | 17 comments |
|
| | When Models Manipulate Manifolds: The Geometry of a Counting Task (transformer-circuits.pub) |
| 4 points by 1wheel 40 days ago | past |
|
| | Visual Features Across Modalities: SVG and ASCII Art Cross-Modal Understanding (transformer-circuits.pub) |
| 12 points by vismit2000 43 days ago | past | 1 comment |
|
| | LLMs extract high-level semantic concepts from SVG and ASCII art (transformer-circuits.pub) |
| 3 points by neuronerd1 43 days ago | past | 1 comment |
|
| | When Models Manipulate Manifolds: The Geometry of a Counting Task (transformer-circuits.pub) |
| 2 points by tanelpoder 46 days ago | past |
|
| | When Models Manipulate Manifolds: The Geometry of a Counting Task (transformer-circuits.pub) |
| 5 points by e_ameisen 46 days ago | past |
|
| | Transformer Circuits: reverse-engineering transformers into graspable programs (transformer-circuits.pub) |
| 1 point by dvrp 4 months ago | past |
|
| | So You Want to Work in Mechanistic Interpretability? (transformer-circuits.pub) |
| 2 points by jxmorris12 5 months ago | past |
|
| | Circuit Tracing: Revealing Computational Graphs in Language Models (Anthropic) (transformer-circuits.pub) |
| 173 points by ydnyshhh 8 months ago | past | 27 comments |
|
| | The Biology of a Large Language Model (transformer-circuits.pub) |
| 117 points by frozenseven 8 months ago | past | 19 comments |
|
| | Circuit Tracing: Revealing Computational Graphs in Language Models (transformer-circuits.pub) |
| 8 points by mfiguiere 8 months ago | past |
|
| | The Biology of a Large Language Model (transformer-circuits.pub) |
| 3 points by mfiguiere 8 months ago | past |
|
| | Insights on Cross-Coder Model Diffing (transformer-circuits.pub) |
| 1 point by gregorymichael 9 months ago | past |
|
| | Transformer Circuits Thread (transformer-circuits.pub) |
| 1 point by fzliu 10 months ago | past |
|
| | Definitions and Motivation: Features, Directions, and Superposition (transformer-circuits.pub) |
| 4 points by Bluestein 11 months ago | past |
|
| | Toy Models of Superposition (2022) (transformer-circuits.pub) |
| 45 points by tessierashpool9 on Nov 8, 2024 | past |
|
| | Transformer Circuits Thread (transformer-circuits.pub) |
| 2 points by plurby on Nov 3, 2024 | past |
|
| | Sparse Crosscoders for Cross-Layer Features and Model Diffing (transformer-circuits.pub) |
| 2 points by benocodes on Oct 25, 2024 | past |
|
| | A collection of small updates from the Anthropic Interpretability team (transformer-circuits.pub) |
| 2 points by daralthus on July 31, 2024 | past |
|
| | Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (transformer-circuits.pub) |
| 22 points by Anon84 on May 23, 2024 | past | 1 comment |
|
| | Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (transformer-circuits.pub) |
| 2 points by wrycoder on May 22, 2024 | past | 1 comment |
|
| | Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (transformer-circuits.pub) |
| 1 point by smaddox on May 22, 2024 | past | 1 comment |
|
| | Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (transformer-circuits.pub) |
| 2 points by veryluckyxyz on May 22, 2024 | past |
|
| | Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (transformer-circuits.pub) |
| 10 points by tosh on May 21, 2024 | past |
|
| | Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (transformer-circuits.pub) |
| 168 points by 1wheel on May 21, 2024 | past | 124 comments |
|
| | Reflections on Qualitative Research (transformer-circuits.pub) |
| 54 points by martingalex2 on April 26, 2024 | past | 3 comments |
|
| | In-Context Learning and Induction Heads (transformer-circuits.pub) |
| 2 points by throwup238 on Jan 28, 2024 | past |
|
| | Circuits Updates – January 2024 (transformer-circuits.pub) |
| 1 point by 1wheel on Jan 24, 2024 | past |
|
|
| More |