Applying Linearly Scalable Transformers to Model Longer Protein Sequences
Researchers proposed a new transformer architecture called “Performer” — based on what they call fast attention via orthogonal random features (FAVOR).
Source: syncedreview.com
Researchers proposed a new transformer architecture called “Performer” — based on what they call fast attention via orthogonal random features (FAVOR).