Most high school students in the United States learn about matrices and matrix multiplication, but they often are not taught why matrix multiplication works the way it does. Adding matrices is easy: you just add the corresponding entries. However, matrix multiplication does not work this way, and for someone who doesn’t understand the theory behind matrices, this way of multiplying matrices may seem extremely contrived and strange. To truly understand matrices, we view them as representations of part of a bigger picture. Matrices represent functions between spaces, called vector spaces, and not just any functions either, but linear functions. This is in fact why linear algebra focuses on matrices. The two fundamental facts about matrices is that every matrix represents some linear function, and every linear function is represented by a matrix. Therefore, there is in fact a one-to-one correspondence between matrices and linear functions. We’ll show that multiplying matrices corresponds to composing the functions that they represent. Along the way, we’ll examine what matrices are good for and why linear algebra sprang up in the first place.

Most likely, if you’ve taken algebra in high school, you’ve seen something like the following:

$\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}.$

Your high school algebra teacher probably told you this thing was a “matrix.”  You then learned how to do things with matrices. For example, you can add two matrices, and the operation is fairly intuitive:

$\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} + \begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} 3 & 3 \\ 5 & 3 \end{pmatrix}.$

You can also subtract matrices, which works similarly. You can multiply a matrix by a number:

$2 \times \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} = \begin{pmatrix} 4 & 2 \\ 8 & 6 \end{pmatrix}.$

Then, when you were taught how to multiply matrices, everything seemed wrong:

$\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}\begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} 3 & 4 \\ 7 & 8 \end{pmatrix}.$

That is, to find the entry in the $i$-th row, $j$-th column of the product, you look at the $i$-th row of the first matrix, the $j$-th column of the second matrix, you multiply together their corresponding numbers, and then you add up the results to get the entry in that position. In the above example, the 1st row, 2nd column entry is a $4$ because the 1st row of the first matrix is $(2, 1)$, the 2nd column of the second matrix is $(2, 0)$, and we have $4 = 2 \times 2 + 1 \times 0$. Moreover, this implies that matrix multiplication isn’t even commutative! If we switch the order of multiplication above, we get

$\begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix}\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} = \begin{pmatrix} 10 & 7 \\ 2 & 1 \end{pmatrix}.$

How come matrix multiplication doesn’t work like addition and subtraction? And if multiplication works this way, how the heck does division work? The goal of this post is to answer these questions.

To understand why matrix multiplication works this way, it’s necessary to understand what matrices actually are. But before we get to that, let’s briefly take a look at why we care about matrices in the first place. The most basic application of matrices is solving systems of linear equations. A linear equation is one in which all the variables appear by themselves with no powers; they don’t get multiplied with each other or themselves, and no funny functions either. An example of a system of linear equations is

$2x +y = 3 \\ 4x + 3y = 7$

The solution to this system is $x = 1, y = 1$. Such equations seem simple, but they easily arise in life. For example, let’s say I have two friends Alice and Bob who went shopping for candy. Alice bought 2 chocolate bars and 1 bag of skittles and spent $3, whereas Bob bought 4 chocolate bars and 3 bags of skittles and spent$7. If we want to figure out how much chocolate bars and skittles cost, we can let $x$ be the price of a chocolate bar and $y$ be the price of a bag of skittles and the variables would satisfy the above system of linear equations. Therefore we can deduce that a chocolate bar costs \$1 and so does a bag of skittles. This system was particularly easy to solve because one can guess and check the solution, but in general, with $n$ variables and equations instead of 2, it’s much harder. That’s where matrices come in! Note that, by matrix multiplication, the above system of linear equations can be re-written as

$\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 3 \\ 7 \end{pmatrix}.$

If only we could find a matrix $A$, which is the inverse of the matrix $\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}$, so that if we multiplied both sides of the equation (on the left) by $A$ we’d get

$\begin{pmatrix} x \\ y \end{pmatrix} = A \begin{pmatrix} 3 \\ 7 \end{pmatrix}.$

The applications of matrices reach far beyond this simple problem, but for now we’ll use this as our motivation. Let’s get back to understanding what matrices are. To understand matrices, we have to know what vectors are. A vector space is a set with a specific structure, and a vector is simply an element of the vector space. For now, for technical simplicity, we’ll stick with vector spaces over the real numbers, also known as real vector spaces. A real vector space is basically what you think of when you think of space. The number line is a 1-dimensional real vector space, the x-y plane is a 2-dimensional real vector space, 3-dimensional space is a 3-dimensional real vector space, and so on. If you learned about vectors in school, then you are probably familiar with thinking about them as arrows which you can add together, multiply by a real number, and so on, but multiplying vectors together works differently. Does this sound familiar? It should. That’s how matrices work, and it’s no coincidence.

The most important fact about vector spaces is that they always have a basis. A basis of a vector space is a set of vectors such that any vector in the space can be written as a linear combination of those basis vectors. If $v_1, v_2, v_3$ are your basis vectors, then $av_1 + bv_2 + cv_3$ is a linear combination if $a,b,c$ are real numbers. A concrete example is the following: a basis for the x-y plane is the vectors $(1,0), (0,1)$. Any vector is of the form $(a,b)$ which can be written as

$\begin{pmatrix} a \\ b \end{pmatrix} = a \begin{pmatrix} 1 \\ 0 \end{pmatrix} + b \begin{pmatrix} 0 \\ 1 \end{pmatrix}$

so we indeed have a basis! This is not the only possible basis. In fact, the vectors in our basis don’t even have to be perpendicular! For example, the vectors $(1,0), (1,1)$ form a basis since we can write

$\begin{pmatrix} a \\ b \end{pmatrix} = (a-b) \begin{pmatrix} 1 \\ 0 \end{pmatrix} + b \begin{pmatrix} 1 \\ 1 \end{pmatrix}$.

Now, a linear transformation is simply a function between two vector spaces that happens to be linear. Being linear is an extremely nice property. A function $f$ is linear if the following two properties hold:

$f(x+y) = f(x) + f(y) \\ f(ax) = af(x)$

For example, the function $f(x) = x^2$ defined on the real line is not linear, since $f(x+y) = (x+y)^2 = x^2 + y^2 + 2xy$ whereas $f(x) + f(y) = x^2 + y^2$. Now, we connect together all the ideas we’ve talked about so far: matrices, basis, and linear transformations. The connection is that matrices are representations of linear transformations, and you can figure out how to write the matrix down by seeing how it acts on a basis. To understand the first statement, we need to see why the second is true. The idea is that any vector is a linear combination of basis vectors, so you only need to know how the linear transformation affects each basis vector. This is because, since the function is linear, if we have an arbitrary vector $v$ which can be written as a linear combination $v = av_1 + bv_2 + cv_3$, then

$f(v) = f(av_1 + bv_2 + cv_3) = af(v_1) + bf(v_2) + cf(v_3).$

Notice that the value of $f(v)$ is completely determined by the values $f(v_1), f(v_2), f(v_3)$, and so that’s all the information we need to completely define the linear transformation. Where does the matrix come in? Well, once we choose a basis for both the domain and the target of the linear transformation, the columns of the matrix will represent the images of the basis vectors under the function. For example, suppose we have a linear transformation $f$ which maps $\mathbb{R}^3$ to $\mathbb{R}^2$, meaning it takes in 3-dimensional vectors and spits out 2-dimensional vectors. Right now $f$ is just some abstract function for which we have no way of writing down on paper. Let’s pick a basis for both our domain (3-space) and our target (2-space, or the plane). A nice choice would be $v_1 = (1,0,0), v_2 = (0,1,0), v_3 = (0,0,1)$ for the former and $w_1 = (1,0), w_2 = (0,1)$ for the latter. All we need to know is how $f$ affects $v_1, v_2, v_3$, and the basis for the target is for writing down the values $f(v_1), f(v_2), f(v_3)$ concretely. The matrix $M$ for our function will be a 2-by-3 matrix, where the 3 columns are indexed by $v_1, v_2, v_3$ and the 2 rows are indexed by $w_1, w_2$. All we need to write down $M$ are the values $f(v_1), f(v_2), f(v_3)$. For concreteness, let’s say

$f(v_1) = 2w_1 + 4w_2 \\ f(v_2) = w_1 - w_2 \\ f(v_3) = w_2.$

Then the corresponding matrix will be

$\begin{pmatrix} 2 & 1 & 0 \\ 4 & -1 & 1 \end{pmatrix}.$

The reason why this works is that matrix multiplication was designed so that if you multiply a matrix by the vector with all zeroes except a 1 in the $i$-th entry, then the result is just the $i$-th column of the matrix. You can check this for yourself. So we know that the matrix $M$ works correctly when applied to (multiplied to) basis vectors. But also matrices satisfy the same properties as linear transformations, namely $M(x + y) = Mx + My$ and $M(ax) = aMx$, where $x,y$ are vectors and $a$ is a real number. Therefore $M$ works for all vectors, so it’s the correct representation of $f$. Note that if we had chosen different vectors for the basis vectors, the matrix would look different. Therefore, matrices are not natural in the sense that they depend on what bases we choose.

Now, finally to answer the question posed at the beginning. Why does matrix multiplication work the way it does? Let’s take a look at the two matrices we had in the beginning: $A = \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}$ and $B = \begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix}$. We know that these correspond to linear functions on the plane, let’s call them $f$ and $g$, respectively. Multiplying matrices corresponds to composing their functions. Therefore, doing $ABx$ is the same as doing $f(g(x))$ for any vector $x$. To determine what the matrix $AB$ should look like, we can see how it affects the basis vectors $w_1 = (1,0), w_2 = (0,1)$. We have

$f(g(w_1)) = f(w_1 + w_2) = f(w_1) + f(w_2) \\ = (2w_1 + 4w_2) + (w_1 + 3w_2) = 3w_1 + 7w_2$

so the first column of $AB$ should be $(3,7)$, and

$f(g(w_2)) = f(2w_1) = 2f(w_1) = 2(2w_1 + 4w_2) = 4w_1 + 8w_2$

so the second column of $AB$ should be $(4,8)$. Indeed, this agrees with the answer we got in the beginning by matrix multiplication! Although this is not at all a rigorous proof, since it’s just an example, it captures the idea of the reason matrix multiplication is the way it is.

Now that we understand how and why matrix multiplication works the way it does, how does matrix division work? You are probably familiar with functional inverses. The inverse of a function $f$ is a function $g$ such that $f(g(x)) = x = g(f(x))$ for all $x$. Since multiplication of matrices corresponds to composition of functions, it only makes sense that the multiplicative inverse of a matrix is the compositional inverse of the corresponding function. That’s why not all matrices have multiplicative inverses. Some functions don’t have compositional inverses! For example, the linear function $f$ mapping $\mathbb{R}^2$ to $\mathbb{R}$ defined by $f(x,y) = x+y$ has no inverse, since many vectors get mapped to the same value (what would $f^{-1}(0)$ be? $(0,0)$? $(1,-1)$?). This corresponds to the fact that the 1×2 matrix $\begin{pmatrix} 1 & 1 \end{pmatrix}$ has no multiplicative inverse. So dividing by a matrix $B$ is just multiplication by $B^{-1}$, if it exists. There are algorithms for computing inverses of matrices, but we’ll save that for another post.