From 0.50 to 0.88 AUROC with activation normalization
Most of my probe gains didn’t come from a fancier classifier. They came from normalization.
A raw residual stream has wildly different magnitudes across tokens and layers. A linear probe trained on that mostly learns magnitude, not direction — and direction is where behavior lives.
X = acts / acts.norm(dim=-1, keepdim=True) # unit-norm per token
That one line took a probe from chance to genuinely useful. The takeaway: before you reach for a bigger model, make sure your features are pointing the way you think they are.
(Placeholder post — replace with your own writing.)
← all transmissions