Microsoft’s DeepSpeed-MoE Makes Massive MoE Model Inference up to 4.5x Faster and 9x Cheaper | Synced

A Microsoft research team proposes DeepSpeed-MoE, comprising a novel MoE architecture design and model compression technique that reduces MoE model size by up to 3.7x and a highly optimized inferen...

By · · 1 min read

Source: Synced | AI Technology & Industry Review

A Microsoft research team proposes DeepSpeed-MoE, comprising a novel MoE architecture design and model compression technique that reduces MoE model size by up to 3.7x and a highly optimized inference system that provides 7.3x better latency and cost compared to existing MoE inference solutions.