This is an fundamental proposition in linear algebra, and I expect it appears in the problem sets of most introductionary courses. I've provided a more formal proof of this before, but I'd like here to present a more intuitive proof.

First of all, note that by being upper triangular, an assumption is that the matrix is {\em square}. Slightly more involved, but similar reasoning can be applied to rectangular upper-trapezoidal matrices, but this need not concern us now.

A matrix \({\bf A}\) is said to be invertible if there exists a matrix \({\bf A^{-1}}\) such that \({\bf A} {\bf A^{-1}} = {\bf I}\) where \({\bf I}\) is the identity matrix. The existence of \({\bf A^{-1}}\), allows equations of the form \({\bf A} {\bf x} = {\bf b}\) to be solved for the solution vector \({\bf x}\) for any target vector \({\bf b}\), by observing that:

\[ \begin{array}{ll} {\bf A^{-1}} {\bf A} {\bf x} & = {\bf A^{-1}} {\bf b} \\ {\bf I} {\bf x} & = {\bf A^{-1}} {\bf b} \\ {\bf x} & = {\bf A^{-1}} {\bf b} \\ \end{array} \]

It turns out that the invertibility of \({\bf A}\) is actually equivalent to saying that a solution vector \({\bf x}\) exists for \({\it any}\) target vector \({\bf b}\) in the equation \({\bf A} {\bf x} = {\bf b}\). Thus the object of this proof is to demonstrate, given an \(n \times n\) square matrix \({\bf A}\), that there exists a solution vector \({\bf x}\) for all target vectors \({\bf b}\), {\bf {\em if}} and {\bf {\em only if}} all the diagonal elements of the matrix \({\bf A}\) are non-zero. The condition to be satisfied is:

\[ \forall {\bf b},\exists {\bf x} : {\bf A} {\bf x} = {\bf b} \]

Let the elements of \({\bf A}\), \({\bf x}\), and \({\bf b}\) be referenced as follows:

\[ \left[ \begin{array}{c} a_{1,1},...,a_{1,n} \\ \vdots \\ a_{n,1},...,a_{n,n} \\ \end{array} \right] \left[ \begin{array}{c} x_1 \\ \vdots \\ x_n \\ \end{array} \right] = \left[ \begin{array}{c} b_1 \\ \vdots \\ b_n \\ \end{array} \right] \]

And let \({\bf a_1}\), \(...\), \({\bf a_n}\) represent the columns of the matrix \({\bf A}\).

The {\bf {\em if}} part of the proposition requires demonstrating that a solution vector \({\bf x}\) can be constructed for any target vector \({\bf b}\). The {\bf {\em only if}} part requires demonstrating that this task is impossible if {\em any} of the diagonal elements are zero. I intend to prove both parts symultaneously, and directly.

The term \({\bf A} {\bf x} = {\bf b}\) can be written

\[ x_1 {\bf a_1} + ... + x_n {\bf a_m} = {\bf b} \]

This says that \({\bf b}\) is a linear combination of the columns of \({\bf A}\), with coefficients taken from \({\bf x}\). Now, let us proceed to construct the solution vector \({\bf x}\) for some given \({\bf b}\).

Let us start with the target \(b_n\), the last element of \({\bf b}\). This must arise from some linear combination of the \(n\)th rows of columns \({\bf a_1}\) to \({\bf a_n}\), but since \({\bf A}\) is upper-triangular, the columns \({\bf a_1} ... {\bf a_{n-1}}\) all contain a zero in the \(n\)th row, that is the elements \(a_{n,1} ... a_{n,n-1}\) are all zero. This means that only the \(n\)th column has any influence in determining the element \(b_n\), in particular the product \(b_n = a_{n,n} x_n\) determines the result. From this it follows that \(a_{n,n}\) cannot be zero, for if it is zero, then \(b_n\) must also be zero. And if \(b_n\) is zero, then it is impossible to find an \({\bf x}\) for {\em any} \({\bf b}\) whenever \(b_n \not = 0\). Thus \(a_{n,n}\) is not zero.

Since \(a_{n,n}\) is not zero, it analytically follows that \(x_n = b_n / a_{n,n}\) for any target \(b_n\). Thus the element \(b_n\) in the vector \({\bf b}\) has been correctly established. Furthermore, this is the {\em only} way the result can be obtained, since no other column holds any influence over the result, therefore the coefficient \(x_n\) is also uniquely determined.

Now consider the next element up in the target vector \(b_{n-1}\), this must arise from some linear combination of the \((n-1)\)th rows of the columns \({\bf a_1}\) to \({\bf a_n}\), but from the previous analysis, we observe that the coefficient of the last column \({\bf a_n}\) has already been established as \(x_n\), therefore the \(n\)th column contributes a known quantity, \(a_{n-1,n} x_{n}\), to the linear combination. Furthermore, since the matrix \({\bf A}\) is upper triangular, every column to the left of \({\bf a_{n-1}}\), contains a zero in the \((n-1)\)th row. That is, the elements \(a_{n-1,1}\) to \(a_{n-1,n-2}\) are all zero. Thus the only column that can change the result \(b_{n-1}\) is the \((n-1)\)th column, and it can only do so via the coefficient \(x_{n-1}\). The result is thus determined as follows: \(b_{n-1} = a_{n-1,n} x_n + a_{n-1,n-1} x_{n-1}\). Now, since \(x_n\) is known, it follows that \(a_{n-1,n-1}\) cannot be zero otherwise the only possible result is \(b_{n-1} = a_{n-1,n} x_n\) which is contrary to the requirement that \({\bf x}\) can be found for {\em any} \({\bf b}\). Therefore \(a_{n-1,n-1}\) is not zero and \(x_{n-1} = (b_{n-1} - a_{n-1,n} x_n) / a_{n-1,n-1}\). Thus the value \(b_{n-1}\) has been correctly established.

The construction proceeds in this manner with the next target \(b_{n-2}\), and the reasoning is the same. Since the coefficients for the columns to the right of \(a_{n-2,n-2}\) have already been uniquely determined, and since the columns to the left all contain zero in the \((n-2)\)th row, it follows that the product of the diagonal element \(a_{n-2,n-2}\) and the coefficient \(x_{n-2}\) is the only way to change the target \(b_{n-2}\). Which means that \(a_{n-2,n-2}\) cannot be zero, and the coefficient \(x_{n-2}\) is analytically determined as follows: \(x_{n-2} = (b_{n-2} - a_{n-2,n} x_n - a_{n-2,n-1} x_{n-1}) / a_{n-2,n-2}\).

This process continues with the remaining elements of the vector \({\bf b}\), in order of decreasing row index, until all the coefficients \(x_n, x_n-1, ... x_1\) have been established.

Note, in general, the target \(b_{q}\) has zeros in the \(q\)th row of the columns \({\bf a_1} ... {\bf a_{q-1}} \) to the left of \(a_{q,q}\), and the contributions of the columns to its right \({\bf a_{q+1}} ... {\bf a_n}\) are uniquely determined by the coefficients \(x_{q+1} ... x_n\). Therefore \(a_{q,q}\) can't be zero, otherwise the result is uniquely determined when it is required that any result can be produced. \(x_q\) is computed as \((b_q - \sum_{i=q+1}^n x_i a_{q,i}) / a_{q,q}\). The latter always takes the same form, and is simply the target \(b_q\) minus the known contributions that the columns to the right of \(a_{q,q}\) will make, divided by \(a_{q,q}\) itself. To see why this works, when the linear combination of the columns are taken, the \(q\)th column will contribute \(a_{q,q} (b - \sum_{i=q+1}^n x_i a_{q,i}) / a_{q,q}\) which evaluates to \(b - \sum_{i=q+1}^n x_i a_{q,i}\) but this right hand sum, is simply the contributions that the columns to the right of \(a_{q,q}\) will contribute, and so we have \(b - \sum_{i=q+1}^n x_i a_{q,i} + \sum_{i=q+1}^n x_i a_{q,i}\) which obviously evaluates to \(b\), giving the required result.

Therefore we have shown a way to construct the solution vector \({\bf x}\) for any target vector \({\bf b}\) given an upper triangular matrix \({\bf A}\) with non-zero diagonal elements, and have shown that this construction is only possible if all the diagonal elements are non-zero. **Q.E.D**