Data Jam EP 6 - Rotary Position Embedding (RoPE)
Rotary Position Embedding, or RoPE, is a type of position embedding which encodes absolute positional information with rotation matrix and naturally incorporates explicit relative position dependency in self-attention formulation.
Notably, RoPE comes with valuable properties such as flexibility of being expand to any sequence lengths, decaying inter-token dependency with increasing relative distances, and capability of equipping the linear self-attention with relative position encoding.
Rotary embeddings work by encoding absolute positional information using a rotation matrix. This matrix tells the model where each token is in relation to the others.
By using a rotation matrix instead of traditional linear embeddings, RoPE can capture relative position dependency in the self-attention formulation.
This allows the model to better understand how tokens relate to each other and use that information to make more accurate predictions.