Multi-head Attention is a Fancy Addition Machine | Towards Data Science

“Attention is All you Need” showed attention as a sequence of multiplicative and concat operations but… what if I told you they are additive?

By · · 1 min read
Multi-head Attention is a Fancy Addition Machine | Towards Data Science

Source: Towards Data Science

“Attention is All you Need” showed attention as a sequence of multiplicative and concat operations but… what if I told you they are additive?