Facebook AI’s NormFormer Employs Extra Normalization to Significantly Improve Transformer Pretraining | Synced

Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) ze...

By · · 1 min read

Source: Synced | AI Technology & Industry Review

Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) zero-shot performance 60 percent faster and improving fine-tuned GLUE performance by 1.9 percent.