Understanding 2D rotation matrices

published: 2019-06-20

categories: misc

When I first learned about rotation matrices they appeared quite “magic”; if you squinted your eyes a bit it sort of made sense, and if you did the math you could prove that the matrix does indeed perform the rotation and that all the group properties are met, but none of that explains where that form comes from, why it works. In this blog post I will explore a way to derive the formula for rotation matrices step by step. If you wish to follow along you need only basic knowledge of linear algebra and trigonometry.

This post makes extensive use of MathML, if your browser does not support it you will be seeing gibberish.

Points on the unit circle

We start our journey with the simple case of the unit circle. A unit circle in the Euclidean plane $E^{2}$ is a circle with its center at the origin and a radius of $1$ . Each point on the plane is given by a pair $(x, y)$ of coordinates. If we limit ourselves to the unit circle we observe that each point is uniquely identified by an angle $α \in [0, 2π)$ around the center. For convenience we will choose that the point $(1, 0)$ corresponds to the angle $α = 0$ , and that rotations go counter-clockwise. Both of these are long-established conventions.

Illustration of Cartesian coordinates based on the angle of rotation

Using basic trigonometry we can see that for a given angle $α$ the coordinates of the point are $(cos α, sin α)$ ; this is true because we can draw a right-angled triangle where the length of the hypotenuse is the radius of the circle and the lengths of the catheti are the coordinates of the point.

Rotations along the unit circle

We can rotate the point $(cos α, sin α)$ around the origin by adding an angle $φ$ to $α$ . Thus we are looking for a matrix $R (φ)$ which solves the equation $(\begin{matrix} cos (φ + α) \\ sin (φ + α) \end{matrix}) = R (φ) (\begin{matrix} cos α \\ sin α \end{matrix}) .$

We are going to make use of two trigonometric identities, their proof is left as an exercise to the reader.

\begin{matrix} sin (x \pm y) & = & sin x cos y \pm cos x sin y \\ cos (x \pm y) & = & cos x cos y \mp sin x sin y \end{matrix}

With these identities we can find the rotation matrix by taking the resulting vector apart.

\begin{array}{rcl} (\begin{matrix} cos (φ + α) \\ sin (φ + α) \end{matrix}) & = & (\begin{matrix} cos φ cos α - sin φ sin α \\ sin φ cos α + cos φ sin α \end{matrix}) \\ = & (\begin{matrix} cos φ & -sin φ \\ sin φ & cos φ \end{matrix}) (\begin{matrix} cos α \\ sin α \end{matrix}) \end{array}

This is indeed the familiar rotation matrix formula. We found it just by applying familiar knowledge from trigonometry.

Rotation of arbitrary points

Let us now widen our scope to all points in the plane: a point is now uniquely identified by its angle $α$ of rotation and by the distance $d$ from the origin. Using the same arguments as above, but taking into account that the length of the hypotenuse is now $d$ , we get the coordinates $(d cos α, d sin α)$ .

It is easy to confirm that our previously found formula for rotation matrices works for points outside of the unit circle as well.

(\begin{matrix} d cos (φ + α) \\ d sin (φ + α) \end{matrix}) = (\begin{matrix} cos φ & -sin φ \\ sin φ & cos φ \end{matrix}) (\begin{matrix} d cos α \\ d sin α \end{matrix})

Rotating and scaling points

As far as rotations go we are done, but we can take it a step further and add a scaling factor $r$ to the formula as well. If we wish to scale one coordinate of the vector we have to scale the corresponding row of the matrix, thus to uniformly scale the entire vector we have to uniformly scale the entire matrix.

(\begin{matrix} d cos (φ + α) \\ d sin (φ + α) \end{matrix}) = (\begin{matrix} r cos φ & -r sin φ \\ r sin φ & r cos φ \end{matrix}) (\begin{matrix} d cos α \\ d sin α \end{matrix})

Consequences

A number of operations can be expressed as special cases of our rotate-scale matrix.

Identity: The identity transformation $id$ is represented by the identity matrix, which corresponds to a scale factor of $r = 1$ and rotation angle of $φ = 0$ .
$id = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})$
Scaling: A pure scaling has a variable scaling factor $r$ and a fixed rotation angle of $φ = 0$ . A scaling matrix is thus just a uniformly scaled identity matrix.
$r id = (\begin{matrix} r & 0 \\ 0 & r \end{matrix})$
Inversion or reflection: Reflecting a point along the origin can be interpreted either as a rotation by $φ = π$ without scaling, or as a scaling by $r = -1$ without rotation. Both yield the same matrix.
$(\begin{matrix} cos π & -sin π \\ sin π & cos π \end{matrix}) = (\begin{matrix} -1 cos 0 & -1 -sin 0 \\ -1 sin 0 & -1 cos 0 \end{matrix}) = (\begin{matrix} -1 & 0 \\ 0 & -1 \end{matrix})$

The group of rotation and scaling matrices

The matrices of rotation and scaling form a group. If we apply a transformation to a point, then apply another transformation to the result it is equivalent to applying one combined transformation to the original point. We combine transformations by multiplying their matrices.

(\begin{matrix} r_{2} cos φ_{2} & r_{2} -sin φ_{2} \\ r_{2} sin φ_{2} & r_{2} cos φ_{2} \end{matrix}) (\begin{matrix} r_{1} cos φ_{1} & r_{1} -sin φ_{1} \\ r_{1} sin φ_{1} & r_{1} cos φ_{1} \end{matrix}) = (\begin{matrix} r_{2} r_{1} cos (φ_{2} + φ_{1}) & r_{2} r_{1} -sin (φ_{2} + φ_{1}) \\ r_{2} r_{1} sin (φ_{2} + φ_{1}) & r_{2} r_{1} cos (φ_{2} + φ_{1}) \end{matrix})

Not only is this a rotation matrix, the result is also independent of the order of operands, something that is generally not true for matrix multiplication. We are thus dealing with a commutative magma. This magma is also an Abelian group:

The neutral element is the identity transformation.
The inverse of a transformation with scale $r$ and angle $φ$ is a transformation with scale $\frac{1}{r}$ and angle $-φ$ .
Since matrix multiplication is associative in general, the composition of transformations must be associative as well.

Conclusion

We have derived the formula for rotation matrices without prior knowledge of what result to work towards. Instead we restricted our research to a very basic case, that of points on a unit circle, and used our knowledge of trigonometry to find a solution. Once we had our simple solution we extended our problem domain to that of arbitrary points and the scaling of vectors, and looked for ways to extend our simple solution to that new domain.

We then investigated some of the properties and concluded that what we have is a group structure, which allows use to use all results from group theory as well. There is actually much more to rotation matrices, but that would be beyond the scope of this post. I mainly wanted to show how one can come up with this formula that usually just appears like “magic” by starting with a simple base case and then further generalising from there.