Rick W / Wednesday, December 24, 2025 / Categories: Artificial Intelligence Rotary Position Embeddings for Long Context Length This article is divided into two parts; they are: • Simple RoPE • RoPE for Long Context Length Compared to the sinusoidal position embeddings in the original Transformer paper, RoPE mutates the input tensor using a rotation matrix: $$ \begin{aligned} X_{n,i} &= X_{n,i} \cos(n\theta_i) - X_{n,\frac{d}{2}+i} \sin(n\theta_i) \\ X_{n,\frac{d}{2}+i} &= X_{n,i} \sin(n\theta_i) + X_{n,\frac{d}{2}+i} \cos(n\theta_i) \end{aligned} $$ where $X_{n,i}$ is the $i$-th element of the vector at the $n$-th position of the sequence of tensor $X$. Previous Article What Is Cloud Optimization? Practical Guide to Optimizing Cloud Usage Next Article Pretraining a Llama Model on Your Local GPU Print 5 Tags: Tensor