Skip to main content

Blog

2026

Hidden Evaluation Awareness
llm evals, eval awareness, comparing cot to activations [Gemma 3+4, contrastive & logistic probes, sad dataset]
Investigating Relational Composition
mech interp, relational composition, sae recovery [toy transformer, probes, causal interventions, saes]
Upskilling for AI Safety Research
transformers, rl, mech interp, evals [papers, videos, reimplementations, arena]