Meta AI’s Sparse All-MLP Model Doubles Training Efficiency Compared to Transformers | Synced

Researchers from Meta AI and the State University of New York at Buffalo propose sparsely-activated all-MLP architectures (sMLPs) that achieve training efficiency improvements of up to 2x compared ...

By · · 1 min read

Source: Synced | AI Technology & Industry Review

Researchers from Meta AI and the State University of New York at Buffalo propose sparsely-activated all-MLP architectures (sMLPs) that achieve training efficiency improvements of up to 2x compared to transformer-based mixture-of-experts (MoE) architectures, transformers, and gMLP.