Most high school students in the United States learn about matrices and matrix multiplication, but they often are not taught *why* matrix multiplication works the way it does. Adding matrices is easy: you just add the corresponding entries. However, matrix multiplication does not work this way, and for someone who doesn’t understand the theory behind matrices, this way of multiplying matrices may seem extremely contrived and strange. To truly understand matrices, we view them as representations of part of a bigger picture. Matrices represent *functions* between spaces, called vector spaces, and not just any functions either, but **linear** functions. This is in fact why **linear algebra** focuses on matrices. The two fundamental facts about matrices is that *every matrix represents some linear function*, and *every linear function is represented by a matrix*. Therefore, there is in fact a one-to-one correspondence between matrices and linear functions. We’ll show that multiplying matrices corresponds to composing the functions that they represent. Along the way, we’ll examine what matrices are good for and why linear algebra sprang up in the first place.

Most likely, if you’ve taken algebra in high school, you’ve seen something like the following:

Your high school algebra teacher probably told you this thing was a “matrix.” You then learned how to do things with matrices. For example, you can add two matrices, and the operation is fairly intuitive:

You can also subtract matrices, which works similarly. You can multiply a matrix by a number:

Then, when you were taught how to multiply matrices, everything seemed wrong:

That is, to find the entry in the -th row, -th column of the product, you look at the -th row of the first matrix, the -th column of the second matrix, you multiply together their corresponding numbers, and then you add up the results to get the entry in that position. In the above example, the 1st row, 2nd column entry is a because the 1st row of the first matrix is , the 2nd column of the second matrix is , and we have . Moreover, this implies that matrix multiplication isn’t even commutative! If we switch the order of multiplication above, we get

How come matrix multiplication doesn’t work like addition and subtraction? And if multiplication works this way, how the heck does division work? The goal of this post is to answer these questions.

To understand why matrix multiplication works this way, it’s necessary to understand what matrices actually are. But before we get to that, let’s briefly take a look at why we care about matrices in the first place. The most basic application of matrices is solving systems of linear equations. A linear equation is one in which all the variables appear by themselves with no powers; they don’t get multiplied with each other or themselves, and no funny functions either. An example of a system of linear equations is

The solution to this system is . Such equations seem simple, but they easily arise in life. For example, let’s say I have two friends Alice and Bob who went shopping for candy. Alice bought 2 chocolate bars and 1 bag of skittles and spent $3, whereas Bob bought 4 chocolate bars and 3 bags of skittles and spent $7. If we want to figure out how much chocolate bars and skittles cost, we can let be the price of a chocolate bar and be the price of a bag of skittles and the variables would satisfy the above system of linear equations. Therefore we can deduce that a chocolate bar costs $1 and so does a bag of skittles. This system was particularly easy to solve because one can guess and check the solution, but in general, with variables and equations instead of 2, it’s much harder. That’s where matrices come in! Note that, by matrix multiplication, the above system of linear equations can be re-written as

If only we could find a matrix , which is the inverse of the matrix , so that if we multiplied both sides of the equation (on the left) by we’d get

The applications of matrices reach far beyond this simple problem, but for now we’ll use this as our motivation. Let’s get back to understanding what matrices are. To understand matrices, we have to know what vectors are. A **vector space** is a set with a specific structure, and a **vector** is simply an element of the vector space. For now, for technical simplicity, we’ll stick with vector spaces over the real numbers, also known as **real vector spaces**. A real vector space is basically what you think of when you think of space. The number line is a 1-dimensional real vector space, the x-y plane is a 2-dimensional real vector space, 3-dimensional space is a 3-dimensional real vector space, and so on. If you learned about vectors in school, then you are probably familiar with thinking about them as arrows which you can add together, multiply by a real number, and so on, but multiplying vectors together works differently. Does this sound familiar? It should. That’s how matrices work, and it’s no coincidence.

The most important fact about vector spaces is that they always have a basis. A **basis** of a vector space is a set of vectors such that any vector in the space can be written as a linear combination of those basis vectors. If are your basis vectors, then is a linear combination if are real numbers. A concrete example is the following: a basis for the x-y plane is the vectors . Any vector is of the form which can be written as

so we indeed have a basis! This is not the only possible basis. In fact, the vectors in our basis don’t even have to be perpendicular! For example, the vectors form a basis since we can write

.

Now, a **linear transformation** is simply a function between two vector spaces that happens to be **linear**. Being linear is an extremely nice property. A function is linear if the following two properties hold:

For example, the function defined on the real line is not linear, since whereas . Now, we connect together all the ideas we’ve talked about so far: matrices, basis, and linear transformations. The connection is that **matrices are representations of linear transformations**, and you can figure out how to write the matrix down by seeing how it acts on a basis. To understand the first statement, we need to see why the second is true. The idea is that any vector is a linear combination of basis vectors, so you only need to know how the linear transformation affects each basis vector. This is because, since the function is linear, if we have an arbitrary vector which can be written as a linear combination , then

Notice that the value of is completely determined by the values , and so that’s all the information we need to completely define the linear transformation. Where does the matrix come in? Well, once we choose a basis for both the domain and the target of the linear transformation, the columns of the matrix will represent the images of the basis vectors under the function. For example, suppose we have a linear transformation which maps to , meaning it takes in 3-dimensional vectors and spits out 2-dimensional vectors. Right now is just some abstract function for which we have no way of writing down on paper. Let’s pick a basis for both our domain (3-space) and our target (2-space, or the plane). A nice choice would be for the former and for the latter. All we need to know is how affects , and the basis for the target is for writing down the values concretely. The matrix for our function will be a 2-by-3 matrix, where the 3 columns are indexed by and the 2 rows are indexed by . All we need to write down are the values . For concreteness, let’s say

Then the corresponding matrix will be

The reason why this works is that matrix multiplication was designed so that if you multiply a matrix by the vector with all zeroes except a 1 in the -th entry, then the result is just the -th column of the matrix. You can check this for yourself. So we know that the matrix works correctly when applied to (multiplied to) basis vectors. But also matrices satisfy the same properties as linear transformations, namely and , where are vectors and is a real number. Therefore works for all vectors, so it’s the correct representation of . Note that if we had chosen different vectors for the basis vectors, the matrix would look different. Therefore, matrices are not natural in the sense that they depend on what bases we choose.

Now, finally to answer the question posed at the beginning. Why does matrix multiplication work the way it does? Let’s take a look at the two matrices we had in the beginning: and . We know that these correspond to linear functions on the plane, let’s call them and , respectively. Multiplying matrices corresponds to **composing** their functions. Therefore, doing is the same as doing for any vector . To determine what the matrix should look like, we can see how it affects the basis vectors . We have

so the first column of should be , and

so the second column of should be . Indeed, this agrees with the answer we got in the beginning by matrix multiplication! Although this is not at all a rigorous proof, since it’s just an example, it captures the idea of the reason matrix multiplication is the way it is.

Now that we understand how and why matrix multiplication works the way it does, how does matrix division work? You are probably familiar with functional inverses. The **inverse** of a function is a function such that for all . Since multiplication of matrices corresponds to composition of functions, it only makes sense that the multiplicative inverse of a matrix is the compositional inverse of the corresponding function. That’s why not all matrices have multiplicative inverses. Some functions don’t have compositional inverses! For example, the linear function mapping to defined by has no inverse, since many vectors get mapped to the same value (what would be? ? ?). This corresponds to the fact that the 1×2 matrix has no multiplicative inverse. So dividing by a matrix is just multiplication by , if it exists. There are algorithms for computing inverses of matrices, but we’ll save that for another post.

## 34 comments

Comments feed for this article

May 5, 2012 at 8:44 am

David MilesWonderful post, thank you. This is almost exactly what I was looking for. Now, I have to try to translate aspects of this for high school students. I wonder if the complexity of this is part of the reason that matrices have been removed from the IB DPs new mathematics curriculum.

November 28, 2012 at 1:51 am

garygreat post! this helps me alot for understanding my upper division courses of linear algebra!

March 21, 2013 at 4:26 pm

Charles PeezyYou have rows and columns confused. Rows are horizontal, columns are vertical.

March 21, 2013 at 4:58 pm

Alan GuoYes, rows are horizontal, columns are vertical. Where in the article do I make a mistake?

May 6, 2013 at 11:59 am

Metro ManAmazing article! Thanks!

July 5, 2013 at 11:55 am

AddaeI am a little confused, how does the f(g(w1)) = f(w1+w2)

and for the second column why do you do f(g(w2)) = f(2w1)

July 6, 2013 at 7:20 am

Alan GuoRecall that g is defined to be the function represented by the matrix B, whose first column is (1 1) and second column is (2 0) in the basis w1 and w2. The first column tells us what g(w1) is and the second column tells us what g(w2) is. In particular, it tells us g(w1) = 1*w1 + 1*w2 and g(w2) = 2*w1 + 0*w2.

July 6, 2013 at 5:50 pm

AddaeThanks, I sort of saw that. But my real question is were do you plug in w1 and w2? Am I missing something basic? Sorry for the inconvenience. Thanks for answering, and so quickly as well.

July 6, 2013 at 8:20 pm

Alan GuoHm, I’m not sure I completely understand your question. But I’ll say some stuff, and if you’re still confused, let me know.

We know, with respect to the basis w1 and w2, we have g(w1) = w1 + w2 and g(w2) = 2*w2. In other words, whenever we see g(w1), we can replace that with w1 + w2, since they’re equal, and similarly we can replace g(w2) with 2*w2. Therefore, f(g(w1)) = f(w1 + w2), since we just substitute w1 + w2 for g(w1) inside f, and similarly f(g(w2)) = f(2*w2).

July 6, 2013 at 10:46 pm

AddaeAlright I know that g(w1) = w1 + w2

So with a regular function g(x) = 5x+3, when you write g(w1) you get g(w1) = 5w1+3

Where do you actually plug in w1 into g if g(x) = w1 + w2

Lets say w1 was (2, 0) instead of (1,0) how would that change g(w1)?

Everything else makes sense just getting lost in the details is all.

July 7, 2013 at 6:12 am

Alan GuoAh, I think I understand your question now. w1 and w2 are not variables, they are actual specific vectors in the plane that I’ve chosen.

So when I say g(w1) = w1 + w2 and g(w2) = 2w1, what I mean is, I’ve chosen some basis w1, w2 for the domain (and range). Every x can be written as a*w1 + b*w2, and so by linearity of g, we have

g(x) = g(a*w1 + b*w2) = a*g(w1) + b*g(w2) = a*(w1+w2) + b*(2*w1) = (a+2b)*w1 + a*w2.

If we choose to represent w1 = (1,0) and w2 = (0,1), what that’s saying is

g(a,b) = (a+2b,a)

which can also be read off the matrix B.

Now, suppose we choose a different basis v1 = (2,0) = 2*w1 and v2 = (0,1) = w2. Then, with respect to this new basis v1,v2, we have

g(v1) = g(2*w1) = 2*g(w1) = 2*w1 + 2*w2 = v1 + 2*v2

g(v2) = g(w2) = 2*w1 = 2*v1

and so the new matrix B’ for g with respect to this new basis would have first column (1 2) and second column (2 0).

July 10, 2013 at 6:47 pm

AddaeTook me a while to process…

Okay so I understand g(a,b) = (a+2b,a)

w1 is a bias vector and w2 is another basis vector

Then when you put those basis vectors into the g(a,b) equation you get your solutions and they are clearly seen from the B Matrix. So the method of g(x) and x being a variable is cleared up. By this method I get it so much more, thank you.

when you go into matrix notation i get lost

w1 = (1,0)

w2 = (0,1)

g represents

[ 1 2 ]

[ 1 0 ]

So when you say g(w1) are you are calling upon the vector (1,1) or the first column that is made by the linear combination of 1w1+1w2? Is w1 similar to the notation of v1 that you used earlier?

If I understand, in algebraic terms that means that g(x) can be expressed as a linear combination of basis vectors w1, w2 where they are (1,0) and (0,1)

(where x is a vector)

g(x) = 1w1+2w2

so then what happens when you put in

w1?

g(w1) = ?

every x can be written as a*w1+b*w2

Took my a while to form my questions, this seems very abstract thank you for helping.

July 10, 2013 at 8:32 pm

Alan GuoMatrix notation only has meaning when you specify a basis. For example, when I write a matrix A as

[a b]

[c d]

what that really means is I’ve fixed a basis v1,v2 for the domain V and a basis w1,w2 for the codomain W, and the matrix A represents the linear function f defined by

f(v1) = a*w1 + c*w2

f(v2) = b*w1 + d*w2

This uniquely specifies how f behaves on the entire domain V, since every vector v in V can be written uniquely as x*v1 + y*v2 for some scalars x,y. So you can think of v as a variable, which is really parametrized by the two variables x,y. Then, by linearity,

f(v) = f(x*v1 + y*v2)

= x*f(v1) + y*f(v2)

= (ax + by)*w1 + (cx + dy)*w2

which is the same as when you multiple the column vector (x, y) by the matrix A:

[a b] [x] = [ax + by]

[c d] [y] [cx + dy]

Note that the column vector (x, y) on the left hand side is written in the (v1,v2) basis, so it represents the vector x*v1 + y*v2, whereas the column vector (ax + by, cx + dy) on the right hand side is written in the (w1, w2) basis, so it represents the vector (ax + by)*w1 + (cx + dy)*w2.

In my examples, I conveniently chose the same basis w1,w2 for both the domain and the codomain.

So anyway, to answer your specific question, when I say g(w1), what I mean is, w1 is a vector which, in the basis w1,w2, is written as 1*w1 + 0*w2, denoted by the column vector (1,0), and g(w1) means applying g to the vector (1,0), so multiply the matrix B by (1,0) which will give you (1,1), so g(w1) = 1*w1 + 1*w2.

August 29, 2015 at 11:51 am

AlexI had the same question as Addae (I think). The way I would put it: It *seems* weird that g(w1) = w1 + w2, because ‘normally’ when you define a function g(x), the RHS involves only the variable x, e.g. g(x) = 2*x. However, for something like g(x) = 2*x + 5*y, one might react as, “Wait, where does y come from? How do you get any sort of y from x?” (Is that what you mean, Addae?)

However, if I understand you correctly Alan, I think g(w1) has a bit different meaning. It’s more like, when I apply the function g to the basis vector w1, what new vector do I get from any linear combination of the basis vectors of the vector space…. NOT necessarily from just w1. Does that clear it up?

October 14, 2013 at 9:46 pm

Jeremy HansbroughHi,

If you have a linear transformation that’s one to one and onto, then the basis vectors span the space and send every vector in the domain to a unique vector in the codomain. The codomain is the same as the range…

Meaning that the ker(T) = {0}, and that the Im(T) = V, where V is the domain…

So if a set of vectors doesn’t span its domain, then the kernel spans a dimension that is sent to 0 by definition. How does this relate to matrix multiplication?

So if you have a matrix where the the vectors making it up are linearly dependent, such as:

[ 1 -1 -1]

[-1 2 3]

[-2 1 0]

All three vectors only span a two space, because one can be expressed in terms of the others. Is there a way to argue that a linear transformation isn’t one to one simply because of the geometry of spanning? How does this relate to matrix multiplication?

October 16, 2013 at 8:15 pm

Alan GuoYes, the kernel of the matrix is intimately related to the geometry of the vectors making up the matrix. In particular, any nonzero linear combination of the columns of the matrix which yields zero (a.k.a. a linearly dependence relation) is a member of the kernel of the matrix. For instance, in your example matrix, if a, b, c are the column vectors of your matrix, then we see that a + 2b – c = 0, so the column vector (1,2,-1) is in your kernel. In fact, multiplying the column vector (x,y,z) by the matrix exactly gives you the vector x*a + y*b + z*c, so the kernel is nontrivial if and only if the columns are linearly dependent.

October 16, 2013 at 2:30 pm

AddaeI finally get it entirely. Thanks A lot. I took a course over the summer to help me out. I started doing so work with matrices and some work with quaternions. Then I decided I would give this a go again and it is surprisingly simple now. What you were doing was expressing a column of g as a linear combination of the basis vectors. After seeing what that does to the basis vectors, you put that answer through f and see how it affects it relative to the basis vectors. Because they all share the same basis vectors this approach works. What were to happen if the basis vectors are not same for the both matrices. I’m guessing you would use the basis vector of the first matrix g, and see how f transforms it.

Just a quick question, I was wondering if you have done much in quaternion algebra and if I could message you sometime about it. If so could you e-mail me! Don’t want to flood you comment section anymore than I have. Thanks a lot for clearing things up, and spending the time to explain the concept to me back then 😛

October 16, 2013 at 8:17 pm

Alan GuoIt’s great to hear that these things are clear now! No, I haven’t done any work in quaternion algebras.

September 12, 2014 at 6:52 am

JuxhinoReally good and straightforward article.

Thank you!

August 29, 2015 at 9:49 am

GideonHi,

Great post. One question: aren’t there multiple matrix representations for a given linear function? Doesn’t this mean that it’s a one to many relationship, not one to one?

Thanks again for writing this!

August 29, 2015 at 12:42 pm

menomnonI don’t see either simultaneous equations or Gaussian elimination mentioned?

September 1, 2015 at 8:41 am

阮一峰：理解矩阵乘法 _ HPJ's Personal Website[…] 前些日子，受到一篇文章的启发，我终于想通了，矩阵乘法到底是什么东西。关键就是一句话，矩阵的本质就是线性方程式，两者是一一对应关系。如果从线性方程式的角度，理解矩阵乘法就毫无难度。 […]

September 1, 2015 at 6:22 pm

理解矩阵乘法 - code123[…] 一篇文章 的启发，我终于想通了，矩阵乘法到底是什么东西。关键就是一句话， […]

September 7, 2015 at 4:02 am

Les liens de la semaine – Édition #148 | French Coding[…] Qu’est-ce qu’une matrice? […]

September 13, 2015 at 12:29 am

Peter VargaThank you for the excellent explanation. Another way to prove this point is geometric algebra to draw a few arrows and the apt student would see how the functions and vectors in a space correspond.

This article gave me an inspiration to solve my problem I was stuck with. Thank you again.

October 4, 2015 at 5:49 am

¡Hola mundo! | Juan Carlos González[…] blandit consectetur posuere. Aenean efficitur, ipsum ut mattis tincidunt, sem tellus malesuada augue, in sollicitudin lectus augue ac nibh. Nunc […]

October 12, 2015 at 9:01 am

rohantmpWow, this helped so much. I’ve why would anyone teach matrices without explaining this. Nowhere else in any ~”Intro to Matrices” sort of thing have I found anything nearly like this.

Thank you so much!

March 5, 2016 at 4:16 am

Arkadeep MukhopadhyayVery intuitive and helpful.

I cordially invite you to visit the blog Antarctica Daily

March 13, 2016 at 7:44 am

理解矩阵乘法-FreeBay.CC[…] 一篇文章 的启发，我终于想通了，矩阵乘法到底是什么东西。关键就是一句话， […]

May 24, 2016 at 6:55 pm

hehethanks

August 31, 2016 at 10:55 am

fatimaperwaiz25This was very helpful. Thanks a lot for writing such a descriptive as well as meaningful explanation of matrices.

November 1, 2016 at 4:12 am

理解矩阵乘法 | Scott-Blog[…] 前些日子，受到一篇文章的启发，我终于想通了，矩阵乘法到底是什么东西。关键就是一句话，矩阵的本质就是线性方程式，两者是一一对应关系。如果从线性方程式的角度，理解矩阵乘法就毫无难度。 […]

April 16, 2017 at 4:14 am

NoOneReblogged this on Transcendence and commented:

An excellent explanation of the apparent mathematical magic called the matrix…

September 16, 2017 at 11:15 pm

转 矩阵 HeroHY – 技术成就梦想[…] 前些日子，受到一篇文章的启发，我终于想通了，矩阵乘法到底是什么东西。关键就是一句话，矩阵的本质就是线性方程式，两者是一一对应关系。如果从线性方程式的角度，理解矩阵乘法就毫无难度。 […]