Facebook AI’s NormFormer Employs Extra Normalization to Significantly Improve Transformer Pretraining
Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) ze...
Source: syncedreview.com
Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) zero-shot performance 60 percent faster and improving fine-tuned GLUE performance by 1.9 percent.