An upper triangular matrix is invertible if and only if all of its diagonal-elements are non zero.

This is an fundamental proposition in linear algebra, and I expect it appears in the problem sets of most introductionary courses. I’ve provided a more formal proof of this before, but I’d like here to present a more intuitive proof.

First of all, note that by being upper triangular, an assumption is that the matrix is {\em square}. Slightly more involved, but similar reasoning can be applied to rectangular upper-trapezoidal matrices, but this need not concern us now.

A matrix $${\bf A}$$ is said to be invertible if there exists a matrix $${\bf A^{-1}}$$ such that $${\bf A} {\bf A^{-1}} = {\bf I}$$ where $${\bf I}$$ is the identity matrix. The existence of $${\bf A^{-1}}$$, allows equations of the form $${\bf A} {\bf x} = {\bf b}$$ to be solved for the solution vector $${\bf x}$$ for any target vector $${\bf b}$$, by observing that:

$$ \begin{array}{ll} {\bf A^{-1}} {\bf A} {\bf x} & = {\bf A^{-1}} {\bf b} \ {\bf I} {\bf x} & = {\bf A^{-1}} {\bf b} \ {\bf x} & = {\bf A^{-1}} {\bf b} \ \end{array} $$

It turns out that the invertibility of $${\bf A}$$ is actually equivalent to saying that a solution vector $${\bf x}$$ exists for $${\it any}$$ target vector $${\bf b}$$ in the equation $${\bf A} {\bf x} = {\bf b}$$. Thus the object of this proof is to demonstrate, given an $$n \times n$$ square matrix $${\bf A}$$, that there exists a solution vector $${\bf x}$$ for all target vectors $${\bf b}$$, {\bf {\em if}} and {\bf {\em only if}} all the diagonal elements of the matrix $${\bf A}$$ are non-zero. The condition to be satisfied is:

$$ \forall {\bf b},\exists {\bf x} : {\bf A} {\bf x} = {\bf b} $$

Let the elements of $${\bf A}$$, $${\bf x}$$, and $${\bf b}$$ be referenced as follows:

$$ \left[ \begin{array}{c} a_{1,1},…,a_{1,n} \ \vdots \ a_{n,1},…,a_{n,n} \ \end{array} \right] \left[ \begin{array}{c} x_1 \ \vdots \ x_n \ \end{array} \right] = \left[ \begin{array}{c} b_1 \ \vdots \ b_n \ \end{array} \right] $$

And let $${\bf a_1}$$, $$…$$, $${\bf a_n}$$ represent the columns of the matrix $${\bf A}$$.

The {\bf {\em if}} part of the proposition requires demonstrating that a solution vector $${\bf x}$$ can be constructed for any target vector $${\bf b}$$. The {\bf {\em only if}} part requires demonstrating that this task is impossible if {\em any} of the diagonal elements are zero. I intend to prove both parts symultaneously, and directly.

The term $${\bf A} {\bf x} = {\bf b}$$ can be written

$$ x_1 {\bf a_1} + … + x_n {\bf a_m} = {\bf b} $$

This says that $${\bf b}$$ is a linear combination of the columns of $${\bf A}$$, with coefficients taken from $${\bf x}$$. Now, let us proceed to construct the solution vector $${\bf x}$$ for some given $${\bf b}$$.

Let us start with the target $$b_n$$, the last element of $${\bf b}$$. This must arise from some linear combination of the $$n$$th rows of columns $${\bf a_1}$$ to $${\bf a_n}$$, but since $${\bf A}$$ is upper-triangular, the columns $${\bf a_1} … {\bf a_{n-1}}$$ all contain a zero in the $$n$$th row, that is the elements $$a_{n,1} … a_{n,n-1}$$ are all zero. This means that only the $$n$$th column has any influence in determining the element $$b_n$$, in particular the product $$b_n = a_{n,n} x_n$$ determines the result. From this it follows that $$a_{n,n}$$ cannot be zero, for if it is zero, then $$b_n$$ must also be zero. And if $$b_n$$ is zero, then it is impossible to find an $${\bf x}$$ for {\em any} $${\bf b}$$ whenever $$b_n \not = 0$$. Thus $$a_{n,n}$$ is not zero.

Since $$a_{n,n}$$ is not zero, it analytically follows that $$x_n = b_n / a_{n,n}$$ for any target $$b_n$$. Thus the element $$b_n$$ in the vector $${\bf b}$$ has been correctly established. Furthermore, this is the {\em only} way the result can be obtained, since no other column holds any influence over the result, therefore the coefficient $$x_n$$ is also uniquely determined.

Now consider the next element up in the target vector $$b_{n-1}$$, this must arise from some linear combination of the $$(n-1)$$th rows of the columns $${\bf a_1}$$ to $${\bf a_n}$$, but from the previous analysis, we observe that the coefficient of the last column $${\bf a_n}$$ has already been established as $$x_n$$, therefore the $$n$$th column contributes a known quantity, $$a_{n-1,n} x_{n}$$, to the linear combination. Furthermore, since the matrix $${\bf A}$$ is upper triangular, every column to the left of $${\bf a_{n-1}}$$, contains a zero in the $$(n-1)$$th row. That is, the elements $$a_{n-1,1}$$ to $$a_{n-1,n-2}$$ are all zero. Thus the only column that can change the result $$b_{n-1}$$ is the $$(n-1)$$th column, and it can only do so via the coefficient $$x_{n-1}$$. The result is thus determined as follows: $$b_{n-1} = a_{n-1,n} x_n + a_{n-1,n-1} x_{n-1}$$. Now, since $$x_n$$ is known, it follows that $$a_{n-1,n-1}$$ cannot be zero otherwise the only possible result is $$b_{n-1} = a_{n-1,n} x_n$$ which is contrary to the requirement that $${\bf x}$$ can be found for {\em any} $${\bf b}$$. Therefore $$a_{n-1,n-1}$$ is not zero and $$x_{n-1} = (b_{n-1} - a_{n-1,n} x_n) / a_{n-1,n-1}$$. Thus the value $$b_{n-1}$$ has been correctly established.

The construction proceeds in this manner with the next target $$b_{n-2}$$, and the reasoning is the same. Since the coefficients for the columns to the right of $$a_{n-2,n-2}$$ have already been uniquely determined, and since the columns to the left all contain zero in the $$(n-2)$$th row, it follows that the product of the diagonal element $$a_{n-2,n-2}$$ and the coefficient $$x_{n-2}$$ is the only way to change the target $$b_{n-2}$$. Which means that $$a_{n-2,n-2}$$ cannot be zero, and the coefficient $$x_{n-2}$$ is analytically determined as follows: $$x_{n-2} = (b_{n-2} - a_{n-2,n} x_n - a_{n-2,n-1} x_{n-1}) / a_{n-2,n-2}$$.

This process continues with the remaining elements of the vector $${\bf b}$$, in order of decreasing row index, until all the coefficients $$x_n, x_n-1, … x_1$$ have been established.


Note, in general, the target $$b_{q}$$ has zeros in the $$q$$th row of the columns $${\bf a_1} … {\bf a_{q-1}} $$ to the left of $$a_{q,q}$$, and the contributions of the columns to its right $${\bf a_{q+1}} … {\bf a_n}$$ are uniquely determined by the coefficients $$x_{q+1} … x_n$$. Therefore $$a_{q,q}$$ can’t be zero, otherwise the result is uniquely determined when it is required that any result can be produced. $$x_q$$ is computed as $$(b_q - \sum_{i=q+1}^n x_i a_{q,i}) / a_{q,q}$$. The latter always takes the same form, and is simply the target $$b_q$$ minus the known contributions that the columns to the right of $$a_{q,q}$$ will make, divided by $$a_{q,q}$$ itself. To see why this works, when the linear combination of the columns are taken, the $$q$$th column will contribute $$a_{q,q} (b - \sum_{i=q+1}^n x_i a_{q,i}) / a_{q,q}$$ which evaluates to $$b - \sum_{i=q+1}^n x_i a_{q,i}$$ but this right hand sum, is simply the contributions that the columns to the right of $$a_{q,q}$$ will contribute, and so we have $$b - \sum_{i=q+1}^n x_i a_{q,i} + \sum_{i=q+1}^n x_i a_{q,i}$$ which obviously evaluates to $$b$$, giving the required result.

Therefore we have shown a way to construct the solution vector $${\bf x}$$ for any target vector $${\bf b}$$ given an upper triangular matrix $${\bf A}$$ with non-zero diagonal elements, and have shown that this construction is only possible if all the diagonal elements are non-zero. Q.E.D