Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI | Synced

Researchers from the University of Texas at Austin and NVIDIA proposes upcycling approach, an innovative training recipe enables the development of an 8-Expert Top-2 MoE model using Llama 3-8B with...

By · · 1 min read

Source: Synced | AI Technology & Industry Review

Researchers from the University of Texas at Austin and NVIDIA proposes upcycling approach, an innovative training recipe enables the development of an 8-Expert Top-2 MoE model using Llama 3-8B with less than 1% of the compute typically required for pre-training.