From 0.50 to 0.88 AUROC with activation normalization

Most of my probe gains didn’t come from a fancier classifier. They came from normalization.

A raw residual stream has wildly different magnitudes across tokens and layers. A linear probe trained on that mostly learns magnitude, not direction — and direction is where behavior lives.

X = acts / acts.norm(dim=-1, keepdim=True)   # unit-norm per token

That one line took a probe from chance to genuinely useful. The takeaway: before you reach for a bigger model, make sure your features are pointing the way you think they are.

(Placeholder post — replace with your own writing.)

← all transmissions