Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Researchers proposed a new transformer architecture called “Performer” — based on what they call fast attention via orthogonal random features (FAVOR).

By · · 1 min read

Source: syncedreview.com

Researchers proposed a new transformer architecture called “Performer” — based on what they call fast attention via orthogonal random features (FAVOR).