Orthogonal matrices

Section 6.3 Orthogonal matrices

Subsection 6.3.1 Definition and examples

Definition 6.3.1.

A square matrix \(Q\) is called an orthogonal matrix if the columns of \(Q\) are an orthonormal set.

Note 6.3.2.

Be careful: Despite the name, a matrix that has orthogonal columns is not necessarily an orthogonal matrix. The definition of "orthogonal matrix" requires that the columns be orthonormal.

Example 6.3.3.

The matrix \(\begin{bmatrix}1 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ 1 \amp -1 \amp 0\end{bmatrix}\) is not an orthogonal matrix. It has orthogonal, but not orthonormal, columns.

The matrix \(\begin{bmatrix}2/\sqrt{6} \amp 0 \amp 1/\sqrt{3} \\ 1/\sqrt{6} \amp 1/\sqrt{2} \amp -1/\sqrt{3} \\ -1/\sqrt{6} \amp 1/\sqrt{2} \amp 1/\sqrt{3}\end{bmatrix}\) is an orthogonal matrix. To verify this, calculate the dot product of each pair of columns and see that the answer is \(0\text{,}\) and then calculate the dot product of each column with itself and see that the answer is \(1\text{.}\)

Subsection 6.3.2 Properties of orthogonal matrices

Theorem 6.3.5.

An \(n \times n\) matrix \(Q\) is an orthogonal matrix if and only if it is invertible and \(Q^{-1} = Q^t\text{.}\)

Proof.

Suppose first that \(Q\) is an orthogonal matrix. By Lemma 6.2.5 and Definition 4.3.8 the \((i,j)\) entry of \(Q^tQ\) is the dot product of row \(i\) of \(Q^t\) and column \(j\) of \(Q\text{.}\) Since row \(i\) of \(Q^t\) is column \(i\) of \(Q\text{,}\) and the columns of \(Q\) form an orthonormal set, we have that the \((i,j)\) entry of \(Q^tQ\) is \(1\) when \(i=j\) and \(0\) when \(i\neq j\text{.}\) That is, \(Q^tQ = I_n\text{.}\) By Theorem 4.4.14 this is enough to allow us to conclude that \(Q\) is invertible and \(Q^{-1} = Q^t\text{.}\)

Now suppose that \(Q^t = Q^{-1}\text{.}\) Then \(Q^tQ = I_n\text{,}\) so the \((i,j)\) entry of \(Q^tQ\) is \(1\) when \(i=j\) and \(0\) when \(i \neq j\text{.}\) As in the previous part of this proof, the \((i,j)\) entry of \(Q^tQ\) is the dot product of column \(i\) of \(Q\) with column \(j\) of \(Q\text{.}\) Thus the columns of \(Q\) are orthonormal, so \(Q\) is an orthogonal matrix.

Example 6.3.6.

Consider the matrix \(A = \begin{bmatrix}2/\sqrt{6} \amp 0 \amp 1/\sqrt{3} \\ 1/\sqrt{6} \amp 1/\sqrt{2} \amp -1/\sqrt{3} \\ -1/\sqrt{6} \amp 1/\sqrt{2} \amp 1/\sqrt{3}\end{bmatrix}\text{,}\) which we have already seen is an orthogonal matrix. Theorem 6.3.5 tells us that

\begin{equation*} A^{-1} = A^t = \begin{bmatrix}2/\sqrt{6} \amp 1/\sqrt{6} \amp -1/\sqrt{6} \\ 0 \amp 1/\sqrt{2} \amp 1/\sqrt{2} \\ 1/\sqrt{3} \amp -1/\sqrt{3} \amp 1/\sqrt{3}\end{bmatrix}\text{.} \end{equation*}

That was a lot faster than our usual methods of finding an inverse - it makes you wish that all matrices could be inverted this easily!

Corollary 6.3.7.

Suppose that \(Q\) is an orthogonal matrix. Then the rows of \(Q\) form an orthonormal set of vectors.

Proof.

If \(Q\) is orthogonal then \(Q^{-1} = Q^t\) by Theorem 6.3.5, so by Theorem 4.4.6,

\begin{equation*} (Q^t)^{-1} = (Q^{-1})^t = (Q^t)^t\text{.} \end{equation*}

By Theorem 6.3.5 again this implies that \(Q^t\) is an orthogonal matrix. Hence by definition the columns of \(Q^t\) form an orthonormal set. Since the columns of \(Q^t\) are the rows of \(Q\text{,}\) we have that the rows of \(Q\) form an orthonormal set.

Recall that when we introduced linear transformations we asked that a linear transformation should interact nicely with vector addition and scalar multiplication, but we did not require that it interact well with the more "geometric" notions of dot products or lengths. It turns out that orthogonal matrices are precisely the matrices of linear transformations that do preserve these geometric features of \(\mathbb{R}^n\text{.}\)

Theorem 6.3.8.

Let \(Q\) be an \(n \times n\) matrix. The following are equivalent:

\(Q\) is an orthogonal matrix.
For all vectors \(\vec{v}\) and \(\vec{w}\) in \(\mathbb{R}^n\text{,}\) \((Q\vec{v}) \cdot (Q \vec{w}) = \vec{v} \cdot \vec{w}\text{.}\)
For every vector \(\vec{v}\) in \(\mathbb{R}^n\text{,}\) \(\norm{Q\vec{v}} = \norm{\vec{v}}\text{.}\)

Corollary 6.3.9.

Let \(Q\) be an orthogonal \(n \times n\) matrix. Then for all vectors \(\vec{v}\) and \(\vec{w}\) in \(\mathbb{R}^n\text{,}\) the angle between \(Q\vec{v}\) and \(Q\vec{w}\) is the same as the angle between \(\vec{v}\) and \(\vec{w}\text{.}\)

Proof.

Let \(\theta\) be the angle between \(Q\vec{v}\) and \(Q\vec{w}\text{,}\) and let \(\phi\) be the angle between \(\vec{v}\) and \(\vec{w}\text{.}\) Then, using Definition 2.2.16 and Theorem 6.3.8, we calculate:

\begin{align*} \theta \amp= \arccos\left(\frac{Q\vec{v}\cdot Q\vec{w}}{\norm{Q\vec{v}}\norm{Q\vec{w}}}\right) \\ \amp= \arccos\left(\frac{\vec{v}\cdot\vec{w}}{\norm{\vec{v}}\norm{\vec{w}}}\right) \\ \amp= \phi \end{align*}

Note 6.3.10.

It is possible for a matrix to preserve all angles (in the sense of Corollary 6.3.9) without being an orthogonal matrix. One easy example of such a matrix is \(\begin{bmatrix}2 \amp 0 \\ 0 \amp 2\end{bmatrix}\text{.}\)

Finally, here are some additional properties that orthogonal matrices have.

Theorem 6.3.11.

Let \(A\) and \(B\) be orthogonal matrices. Then:

\(AB\) is an orthogonal matrix.
\(A^{-1}\) is an orthogonal matrix.
\(A^t\) is an orthogonal matrix.
\(\det(A) = \pm 1\text{.}\)
Every eigenvalue \(\lambda\) of \(A\) in \(\mathbb{C}\) has \(\abs{\lambda} = 1\text{.}\)

Proof.

We have \((AB)^t = B^tA^t = B^{-1}A^{-1} = (AB)^{-1}\text{,}\) so \(AB\) is orthogonal by Theorem 6.3.5.
Calculate \((A^{-1})^t = (A^t)^t = A = (A^{-1})^{-1}\text{,}\) so \(A^{-1}\) is orthogonal by Theorem 6.3.5.
By Corollary 6.3.7 the rows of \(A\) are orthonormal, so the columns of \(A^t\) are orthonormal.
Recall from Theorem 4.5.15 that \(\det(A^t) = \det(A)\) and \(\det(A^{-1}) = 1/\det(A)\text{.}\) Since \(A^t = A^{-1}\) we get \(\det(A) = 1/\det(A)\text{,}\) so \(\det(A)^2 = 1\text{,}\) and thus \(\det(A) = \pm 1\text{.}\)
If \(A\vec{v} = \lambda\vec{v}\) with \(\vec{v} \neq \vec{0}\text{,}\) then \(\norm{A\vec{v}} = \norm{\lambda\vec{v}}\text{,}\) so by Theorem 6.3.8 we have \(\norm{\vec{v}} = \abs{\lambda}\norm{\vec{v}}\text{.}\) Since \(\norm{\vec{v}} \neq 0\) we conclude that \(\abs{\lambda} = 1\text{.}\)

Subsection 6.3.3 Application: Characterizing rotations of \(\mathbb{R}^2\)

We have seen already that rotations of \(\mathbb{R}^2\) are linear transformations, and we have seen what their matrices look like. As an application of the ideas of this chapter we give an abstract way of detecting whether or not a given \(2 \times 2\) matrix is the matrix of a rotation. For any angle \(\theta\text{,}\) let \(T_\theta\) denote the linear transformation of \(\mathbb{R}^2\) that rotates vectors counter-clockwise by \(\theta\) radians.

Theorem 6.3.12.

Let \(A\) be a \(2 \times 2\) matrix. The following are equivalent:

There is an angle \(\theta\) such that \(A = [T_\theta]\text{.}\)
\(A\) is an orthogonal matrix and \(\det(A) = 1\text{.}\)

Proof.

First, suppose that (1) holds, so \(A = [T_\theta] = \begin{bmatrix}\cos(\theta) \amp -\sin(\theta) \\ \sin(\theta) \amp \cos(\theta)\end{bmatrix}\text{.}\) Then the dot product of the columns of \(A\) is

\begin{equation*} \begin{bmatrix}\cos(\theta) \\ \sin(\theta)\end{bmatrix}\cdot \begin{bmatrix}-\sin(\theta)\\ \cos(\theta)\end{bmatrix} = \cos(\theta)(-\sin(\theta))+\sin(\theta)\cos(\theta) = 0\text{,} \end{equation*}

and the lengths are

\begin{equation*} \norm{\begin{bmatrix}\cos(\theta) \\ \sin(\theta)\end{bmatrix}} = \sqrt{\cos^2(\theta) + \sin^2(\theta)} = 1 \end{equation*}

and

\begin{equation*} \norm{\begin{bmatrix}-\sin(\theta) \\ \cos(\theta)\end{bmatrix}} = \sqrt{(-\sin(\theta))^2 + \cos^2(\theta)} = 1\text{.} \end{equation*}

Thus \(A\) is an orthogonal matrix. The determinant is

\begin{equation*} \det(A) = \det\begin{bmatrix}\cos(\theta) \amp -\sin(\theta) \\ \sin(\theta) \amp \cos(\theta)\end{bmatrix} = \cos^2(\theta) + \sin^2(\theta) = 1\text{.} \end{equation*}

We've thus shown that (2) holds.

Now suppose that (2) holds, and that \(A = \begin{bmatrix}a \amp b \\ c \amp d\end{bmatrix}\text{.}\) Since \(A\) is an orthogonal matrix, the vectors \(\begin{bmatrix}a\\c\end{bmatrix}\) and \(\begin{bmatrix}b\\d\end{bmatrix}\) are unit vectors in \(\mathbb{R}^2\text{,}\) which means that (in standard position) they end at points on the unit circle. There are therefore angles \(\phi\) and \(\psi\) such that \(\begin{bmatrix}a\\c\end{bmatrix} = \begin{bmatrix}\cos(\phi) \\ \sin(\phi)\end{bmatrix}\) and \(\begin{bmatrix}b\\d\end{bmatrix} = \begin{bmatrix}\cos(\psi) \\ \sin(\psi)\end{bmatrix}\text{.}\) More than this, the vectors \(\begin{bmatrix}a\\c\end{bmatrix}\) and \(\begin{bmatrix}b\\d\end{bmatrix}\) must be orthogonal, so \(\psi = \phi \pm \pi/2\text{.}\)

If \(\psi = \phi+\pi/2\) then \(\cos(\psi) = \cos(\phi+\pi/2) = -\sin(\phi)\) and \(\sin(\psi) = \sin(\phi+\pi/2) = \cos(\phi)\text{.}\) In this case we get

\begin{equation*} A = \begin{bmatrix}\cos(\phi) \amp \cos(\psi) \\ \sin(\phi) \amp \sin(\psi)\end{bmatrix} = \begin{bmatrix}\cos(\phi) \amp -\sin(\phi) \\ \sin(\phi) \amp \cos(\phi)\end{bmatrix} = [T_\phi]\text{.} \end{equation*}

The other case is \(\psi = \phi-\pi/2\text{.}\) In this case \(\cos(\psi) = \cos(\phi-\pi/2) = \sin(\phi)\) and \(\sin(\psi) = \sin(\phi - \pi/2) = -\cos(\phi)\text{.}\) Then we have

\begin{equation*} A = \begin{bmatrix}\cos(\phi) \amp \cos(\psi) \\ \sin(\phi) \amp \sin(\psi) \end{bmatrix} = \begin{bmatrix}\cos(\phi) \amp \sin(\phi) \\ \sin(\phi) \amp -\cos(\phi)\end{bmatrix}\text{.} \end{equation*}

This does not look like the matrix of a rotation, but there's also a part of the hypothesis (2) that we haven't used yet: \(\det(A) = 1\text{.}\) If we calculate the determinant of the matrix we obtained in this case we get

\begin{equation*} \det\begin{bmatrix}\cos(\phi) \amp \sin(\phi) \\ \sin(\phi) \amp -\cos(\phi)\end{bmatrix} = -\cos^2(\phi) - \sin^2(\phi) = -1\text{.} \end{equation*}

Since this contradicts the assumption (2), this case (\(\psi = \phi-\pi/2)\)) is impossible. We therefore were in the previous case, and we have \(A = [T_\phi]\text{.}\)

Example 6.3.13.

Let \(A = \frac{1}{2\sqrt{2}}\begin{bmatrix}1+\sqrt{3} \amp 1-\sqrt{3} \\ \sqrt{3} - 1\amp 1+\sqrt{3}\end{bmatrix}\text{.}\) By calculating the dot products of the columns we find that \(A\) is an orthogonal matrix, and another direct calculation shows that \(\det(A) = 1\text{.}\) Therefore there is some angle such that \(A = [T_\theta]\text{.}\) In fact, \(\theta = \pi/12\text{,}\) which we can find by knowing that \(\cos(\theta) = \frac{1+\sqrt{3}}{2\sqrt{2}}\) and \(\sin(\theta) = \frac{\sqrt{3}-1}{2\sqrt{2}}\text{.}\)

Exercises 6.3.4 Exercises

1.

Fill in the missing entries to make the matrix orthogonal.

\begin{equation*} \begin{bmatrix} \frac{-1}{\sqrt{2}} \amp \frac{-1}{\sqrt{6}} \amp \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{2}} \amp -- \amp -- \\ -- \amp \frac{\sqrt{6}}{3} \amp -- \end{bmatrix} . \end{equation*}

2.

Fill in the missing entries to make the matrix orthogonal.

\begin{equation*} \begin{bmatrix} \frac{-1}{3} \amp \frac{-2}{\sqrt{5}} \amp -- \\ \frac{2}{3} \amp 0 \amp -- \\ -- \amp -- \amp \frac{4}{15}\sqrt{5} \end{bmatrix} . \end{equation*}

3.

If \(P \) is a triangular orthogonal matrix, show that \(P \) is diagonal and that all diagonal entries are 1 or -1.

4.

If \(P \) is an orthogonal matrix, show that \(kP \) is orthogonal if and only if \(k =1 \) or \(k =-1 \text{.}\)

Matrix Algebra from a Geometric Point of View

Section 6.3 Orthogonal matrices

Subsection 6.3.1 Definition and examples

Definition 6.3.1.

Note 6.3.2.

Example 6.3.3.

Example 6.3.4.

Subsection 6.3.2 Properties of orthogonal matrices

Theorem 6.3.5.

Proof.

Example 6.3.6.

Corollary 6.3.7.

Proof.

Theorem 6.3.8.

Corollary 6.3.9.

Proof.

Note 6.3.10.

Theorem 6.3.11.

Proof.

Subsection 6.3.3 Application: Characterizing rotations of \(\mathbb{R}^2\)

Theorem 6.3.12.

Proof.

Example 6.3.13.

Exercises 6.3.4 Exercises

1.

2.

3.

4.