I understand the concept of eigenvectors/eigenvalues, and I understand the concept of the covariance matrix, but I'm having a hard time understanding why the eigenvectors of the covariance matrix represent the principal components. Does anyone have a intuitive explanation they would like to share? Maybe I don't understand eigenvectors as well as I thought I did...
The eigenvectors of any diagonalizable matrix (which any covariance matrix is, see elsewhere on this page for a discussion of why this is the case) form a basis for the space in which the matrix operates. The eigenvectors corresponding to the "largest" eigenvalues are called principal components because they are the directions in which the matrix principally act.
Consider a matrix A which has two eigenvectors u, v with eigenvalues a, b, and a >> b. Then A(u + v) = au + bv. But since a >> b the operator is stretching u moreso than v, and as a result we think of an operator acting 'principally' in direction u rather than v, and hence the eigenvectors are considered the 'principal components'.
I exaggerated the case by letting a >> b, but the same holds true for a similar in magnitude to b, but by considering multiple applications of the matrix you see the same effect: A^n (u + v).
covariance matrices are nothing special in this regard.
Here is a totally intuitive and probably wrong way to think about it, but its what I have in my head as my own mental model. Imagine that a matrix transformation is like a wind blowing in a particular direction. If you throw a ball, it gets blown by the wind, and its direction gets changed. If you throw it obliquely it ends up a certain distance from you. Which direction would you throw it in to make it go as far as possible? You'd line up the throw with the direction of the wind. Effectively that's finding your largest eigenvalue / eigenvector.
Now, for finding the first principle component we do something special. We create a wind that blows in the direction of variation of the data. We do it by getting the covariance between all our variables and substituting them into the each position of a matrix so that they make up vectors where each dimension is weighted according to the size of the variation in that direction. This makes a "wind" in the direction of the variation of the data. The first principle component finds the vector that most aligns with this covariance "wind".
I am sure this is technically wrong in essential ways and I'd love to hear it be corrected!