https://www.lesswrong.com/posts/XGHf7EY3CK4KorBpw/understanding-llms-insights-from-mechanistic
2 minute summary * At a high level, a transformer-based LLM is an autoregressive, next-token predictor. It takes a sequence of
understandingllmsinsightsinterpretability
https://www.anthropic.com/research/team/interpretability
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
interpretabilityresearchanthropic
https://blog.mozilla.ai/smarter-prompts-for-better-responses-exploring-prompt-optimization-and-interpretability-for-llms/
Generative AI models are highly sensitive to input phrasing. Even small changes to a prompt or switching between models can lead to different results. Adding...
smarterpromptsbetterresponsesexploring
https://www.apolloresearch.ai/research/interpretability-in-parameter-space-minimizing-mechanistic-description-length-with-attribution-based-parameter-decomposition/
Nov 18, 2025 - We introduce Attribution-based Parameter Decomposition (APD), a method that directly decomposes a neural network's parameters into components that (i) are...
interpretabilityparameterspacedescriptionlength
https://arize.com/blog/llm-interpretability-and-sparse-autoencoders-openai-anthropic/
Sep 17, 2024 - Breaking down two papers that focus on the sparse autoencoder--an unsupervised approach for extracting interpretable features from an LLM.
llminterpretabilitysparseautoencodersresearch
https://distill.pub/2018/building-blocks/
Interpretability techniques are normally studied in isolation. We explore the powerful interfaces that arise when you combine them -- and the rich structure of...
building blocksinterpretability
https://towardsdatascience.com/mechanistic-interpretability-peeking-inside-an-llm/
Feb 5, 2026 - Are the human-like cognitive abilities of LLMs real or fake? How does information travel through the neural network? Is there hidden knowledge inside an LLM?
towards data scienceinterpretabilitypeekinginsidellm