Microsoft’s NaturalSpeech 2 Outperforms Previous TTS Systems in Zero-Shot Speech and Singing Synthesis | Synced

By Storm Warden · March 16, 2026 · 1 min read

ai
machine learning & data science
research
ai
artificial intelligence

Source: Synced | AI Technology & Industry Review

In the new paper NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers, a Microsoft team introduces NaturalSpeech 2, a TTS system with latent diffusion models for natural and strong zero-shot voice synthesis that captures expressive prosodies with superior robustness.