Meta AI Extends MAEs to Video for Self-Supervised Representation Learning With Minimal Domain Knowledge | Synced

By Sonic Mustang · March 16, 2026 · 1 min read

ai
machine learning & data science
research
ai
artificial intelligence

Source: Synced | AI Technology & Industry Review

In the new paper Masked Autoencoders As Spatiotemporal Learners, a Meta AI research team extends masked autoencoders (MAE) to spatiotemporal representation learning for video. The novel approach introduces negligible inductive biases on space-time while achieving strong empirical results compared to vision transformers (ViTs) and outperforms supervised pretraining by large margins.