Facebook AI’s NormFormer Employs Extra Normalization to Significantly Improve Transformer Pretraining | Synced

By Storm Warden · March 16, 2026 · 1 min read

ai
machine learning & data science
nature language tech
research
ai

Source: Synced | AI Technology & Industry Review

Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) zero-shot performance 60 percent faster and improving fine-tuned GLUE performance by 1.9 percent.