Hidden Evaluation Awareness
llm evals, eval awareness, comparing cot to activations
[Gemma 3+4, contrastive & logistic probes, sad dataset]
llm evals, eval awareness, comparing cot to activations
[Gemma 3+4, contrastive & logistic probes, sad dataset]
mech interp, relational composition, sae recovery
[toy transformer, probes, causal interventions, saes]
transformers, rl, mech interp, evals
[papers, videos, reimplementations, arena]