Intel’s Prune Once for All Compression Method Achieves SOTA Compression-to-Accuracy Results on BERT | Synced

An Intel research team presents Prune Once for All (Prune OFA), a training method that leverages weight pruning and model distillation to produce pretrained transformer-based language models with h...

By · · 1 min read

Source: Synced | AI Technology & Industry Review

An Intel research team presents Prune Once for All (Prune OFA), a training method that leverages weight pruning and model distillation to produce pretrained transformer-based language models with high sparsity ratios. Applied to BERT, the approach achieves state-of-the-art results in compression-to-accuracy ratio.