Monday, December 15, 2025

Semantic Regexes: Auto-Decoding LLM Options with a Structured Language


Automated interpretability goals to translate giant language mannequin (LLM) options into human comprehensible descriptions. Nevertheless, these pure language function descriptions are sometimes obscure, inconsistent, and require guide relabeling. In response, we introduce semantic regexes, structured language descriptions of LLM options. By combining primitives that seize linguistic and semantic function patterns with modifiers for contextualization, composition, and quantification, semantic regexes produce exact and expressive function descriptions. Throughout quantitative benchmarks and qualitative analyses, we discover that semantic regexes match the accuracy of pure language whereas yielding extra concise and constant function descriptions. Furthermore, their inherent construction affords new kinds of analyses, together with quantifying function complexity throughout layers, scaling automated interpretability from insights into particular person options to model-wide patterns. Lastly, in person research, we discover that semantic regex descriptions assist individuals construct correct psychological fashions of LLM function activations.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles