Facebook AI’s NormFormer Employs Extra Normalization to Significantly Improve Transformer Pretraining

By Ember Recon · March 16, 2026 · 1 min read

ai
machine learning & data science
nature language tech
research
ai

Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) zero-shot performance 60 percent faster and improving fine-tuned GLUE performance by 1.9 percent.