Search

Word Search

Information System News

Rotary Position Embeddings for Long Context Length
Rick W

Rotary Position Embeddings for Long Context Length

This article is divided into two parts; they are: • Simple RoPE • RoPE for Long Context Length Compared to the sinusoidal position embeddings in the original Transformer paper, RoPE mutates the input tensor using a rotation matrix: $$ \begin{aligned} X_{n,i} &= X_{n,i} \cos(n\theta_i) - X_{n,\frac{d}{2}+i} \sin(n\theta_i) \\ X_{n,\frac{d}{2}+i} &= X_{n,i} \sin(n\theta_i) + X_{n,\frac{d}{2}+i} \cos(n\theta_i) \end{aligned} $$ where $X_{n,i}$ is the $i$-th element of the vector at the $n$-th position of the sequence of tensor $X$.
Previous Article What Is Cloud Optimization? Practical Guide to Optimizing Cloud Usage
Next Article Pretraining a Llama Model on Your Local GPU
Print
5