Facebook AI’s NormFormer Employs Extra Normalization to Significantly Improve Transformer Pretraining | Synced
Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) ze...
Source: Synced | AI Technology & Industry Review
Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) zero-shot performance 60 percent faster and improving fine-tuned GLUE performance by 1.9 percent.