NVIDIA’s Hybrid: Combining Attention and State Space Models for Breakthrough Performance of Small Language Models
An NVIDIA research team proposes Hymba, a family of small language models that blend transformer attention with state space models, which outperforms the Llama-3.2-3B model with a 1.32% higher aver...
Source: syncedreview.com
An NVIDIA research team proposes Hymba, a family of small language models that blend transformer attention with state space models, which outperforms the Llama-3.2-3B model with a 1.32% higher average accuracy, while reducing cache size by 11.67× and increasing throughput by 3.49×.