Intel’s Prune Once for All Compression Method Achieves SOTA Compression-to-Accuracy Results on BERT | Synced
An Intel research team presents Prune Once for All (Prune OFA), a training method that leverages weight pruning and model distillation to produce pretrained transformer-based language models with h...
Source: Synced | AI Technology & Industry Review
An Intel research team presents Prune Once for All (Prune OFA), a training method that leverages weight pruning and model distillation to produce pretrained transformer-based language models with high sparsity ratios. Applied to BERT, the approach achieves state-of-the-art results in compression-to-accuracy ratio.