The use of car data and how it correlates is a good way of showing how PCA treats each variable. For example acceleration and weight are negatively correlated while displacement and horsepower are highly correlated.
I'm sure it's still very popular in engineering. It's still very much the lingua franca for numerical methods applied to engineering problems - there are decades of materials solving engineering problems written in MATLAB.
Matlab is great. I won’t hire anyone who only has experience in it though, like I might hire someone who only has python numerical analysis tools experience. It’s brittle and difficult to put into production in my experience.
It's used heavily in school still. I think statistics and data science in general have dropped it along with other relics like SAS (used today only by those people who haven't bothered learning anything other than SAS), and are sticking with R and python for functionality previously done in matlab.
Yes. We made a proof on concept thing on top of a closed source 2.4Ghz radio wave propagation simulator built in matlab. Interfacing to compiled matlab through hacky command line and .mat files is a pretty horrifying experience :)
Let’s say you eat a piece of cake. You say, “hmm, salty, sweet, fruity, nice texture.” Let’s call those attributes the “principal components” of the cake from your point of view. Are there others? Maybe you could have also said “moist, or fluffy,” but you didn’t because those weren’t as obvious, so not as important.
When the cake was made, according to the recipe, there were no instructions on how to add “sweet” or “fruity.” Instead, there was a list of ingredients: sugar, vanilla, lemon juice, flour, water, baking powder, etc. The mixture of these ingredients in the quantities dictated (plus the baking) resulted in the cake having the characteristics that you described. Some of the characteristics have a strong reliance on just one or two ingredients, e.g. “sweet” with “sugar,” and some characteristics are the result of subtle combinations of many ingredients, e.g. “texture.”
The list of characteristics (principal components) definitely describe the cake, but in a more convenient and relevant way. You don’t need the whole list of ingredients to describe the cake. This is what makes principal component analysis useful.
A more recent approach to visualizing high-dimensional data is the t-SNE algorithm, which I normally use together with PCA when exploring big data sets. If you're interested in the differences between both methods, here's a really good answer: https://stats.stackexchange.com/a/249520.
I think PCA is a good reason to learn enough linear algebra to understand PCA. It means learning about basis, rank, rank approximation, orthogonality, eigenvectors, spectral decomposition, etc. There's a whole iceberg of concepts that go into actually understand PCA without which PCA is not really understood.
> a transformation no different than finding a camera angle
I’ve used PCA a bit in the past and it’s so abstract that one forgets how to conceptualize it shortly after finishing the task. This is an interesting and memorable way to put it, I like that.
PCA is a cool technique mathematically, but in my many years of building models, I've never seen it result in a more accurate model. I could see it potentially being useful in situations where you're forced to use a linear/logistic model since you're going to have to do a lot of feature preprocessing, but tree ensembles, NNs, etc. are all able to tease out pretty complicated relationships among features on their own. Considering that PCA also complicates things from a model interpretability point of view, it feels to me like a method whose time has largely passed.
> Considering that PCA also complicates things from a model interpretability point of view
This is a strange comment since my primary usages of PCA/SVD is as a first step in understanding latent factors which are driving the data. Latent factors typically involve all of the important things that anyone running a business or deciding policy care about: customer engagement, patient well being, employee hapiness, etc all represent latent factors.
If you have ever wanted to perform data analysis and gain some exciting insight into explaining user behavior, PCA/SVD will get you there pretty quickly. It is one of the most powerful tools in my arsenal when I'm working on a project that requires interoperability.
The "loadings" in PC and the V matrix in SVD both contain information about how the original feature space correlates with the new projection. This can easily show thing things like "User's who do X,Y and NOT Z are more likely to purchase".
Likewise in LSA (Latent Semantic Analysis/indexing) on a Term-Frequency matrix you will get a first pass at semantic embedding. You'll notice, for example, that "dog" and "cat" will project onto the new space in a common PC which can be used to interpret "pets".
> I've never seen it result in a more accurate model. I could see it potentially being useful in situations where you're forced to use a linear/logistic model
PCA/SVD are a linear transformation of the data and shouldn't give you any performance increase on a linear model. However they can be very helpful in transforming extremely high dimensional, sparse vectors into lower dimensional, dense representations. This can provide a lot of storage/performance benefits.
> NNs, etc. are all able to tease out pretty complicated relationships among features on their own.
PCA is literally identical to an autoencoder minimizing the MSE with no non-linear layers. It is a very good first step towards understanding what your NN will eventually do. After all, all NNs perform a non-linear matrix transformation so that your final vector space is ultimately linearly separable.
Sure, everyone wants to get to the latent factors that really drive the outcome of interest, but I've never seen a situation in which principal components _really_ represent latent factors unless you squint hard at them and want to believe. As for gaining insight and explaining user behavior, I'd much rather just fit a decent model and share some SHAP plots for understanding how your features relate to the target and to each other.
If you like PCA and find it works in your particular domains, all the more power to you. I just don't find it practically useful for fitting better models and am generally suspicious of the insights drawn from that and other unsupervised techniques, especially given how much of the meaning of the results gets imparted by the observer who often has a particular story they'd like to tell.
I've used PCA with good results in the past. My problem essentially simplified down to trying to find nearest neighbours in high dimensional spaces. Distance metrics in high dimensional spaces don't behave nicely. Using PCA to cut reduce the number of dimensions to something more manageable made the problem much more tractable.
By definition there are more accurate models, the PCA is kind of like a general lossy compression algorithm. Any model you come up with can be superseded by a more accurate model up until you have a perfect description of a phenomenon, but PCA is a well understood technique, can be computed very fast using optimized algorithms and GPUs and pretty much anyone can easily understand PCA and apply it to a wide variety of problems, and from a technical standpoint the ratio of output bits to input bits preserves the maximum amount of information.
We use PCA quite a lot at my quant firm do something similar to clustering in high dimensional spaces. A simple use case would be to arrange stocks so that stocks that move similarly to one another are grouped close together.
Another use case for PCA is breaking stocks down into constituent components, for example being able to express the price of a stock as a linear combination of factors: MSFT = 5% oil + 10% interest rates + 40% tech sector + ...
You can also do this for things like ETFs, where in principle an ETF is potentially made up of 100 stocks, but in practice only 10 of those stocks really determine the price, so if you're engaged in ETF market making you can hold neutral portfolio by carrying the ETF long and a small handful of stocks short.
By definition, it's going to result in a less accurate model, unless you keep all of the dimensions or your data is very weird, right? And NNs are going to complicate your interpretability more?
When/if used properly, no. The idea behind PCA is to find a set of features with far less dimensionality than the original data. The hope/intent with this sort of approach is that any more fitted features are just fitting noise.
For people who are curious, the GP is correct when it comes to fitting the training data. Recall, with enough parameters, we can get 100% on training. The parent’s comment is about testing/validation where we want to avoid overfitting so removing the least important parameters can be helpful.
PCA is good enough for a lot of things. For example, it is used in genetics to measure relatedness between populations reasonably well. A perfect model doesn't really exist when the data you are able to realistically collect is only a subset of the population anyway, perhaps biased toward how it was collected.
if you know that your data comes from a stationary distribution, you can use it as a compression technique which reduces the computational demands on your model. sure, computing the initial svd or covariance matrix is expensive, but once you have it, the projection is just a matrix multiply and a vector subtraction. (with the reverse being the same)
if you have some high dimensional data and you just want to look at it, it's a pretty good start. not only does it give you a sense for whether higher dimensions are just noise (by looking at the eigenspectrums) it also makes low dimensional plots possible.
pca, cca and ica have been around for a very long time. i doubt "their time has passed."
It is still a nice tool for projecting things (at least to visualize) where you expect the data to be on a lower dimensional hyperplane. I do agree in most cases t-SNE or UMAP are better (esp if you don’t care about distances).
I put the four dots on the corners of a square and the fifth in the center. This results in the same square in the PCA pane but rotated about 45 degrees. Then, if you take one of the dots on the square corner and move it ever so sligthly in and out, you see the PCA square wildly rotating. Pretty cool to demonstrate sensitivity to small changes in the inputs.
I was thinking the other night, PCA can be on used on images for compression, what would it look like if you took two images, combined into pairs the principle components, and then lerped between them as a transition effect.
Not really, I was thinking that by gradually shifting between the principal components of the image you could subtly shift from one to the other, but it might just look like visual garbage instead :) maybe like start with the lowest components and then gradually move to the strongest components.
if recall yeah there probably will be. linear regression minimises the vertical distance of a point to the regression line whereas PCA minimises the orthogonal distance of the point to the line.
Linear regression uses a measure of an "error" for every data point. Visually, the error is the vertical difference between a data point and the line/plane of linear regression. In contrast, PCA measures the distance from the data point along the line perpendicular to the PCA axis. The PCA distance is also known as a "projection".
There is something known as orthogonal regression (total least squares) which uses the same measure as PCA. Unfortunately it doesn't work well across incompatible variables.
If you know a bit of linear algebra the transformation is surprisingly intuitive.
Your goal is to create a set of orthogonal vectors, each that captures the highest amount of variance in the original data (the assumption is that variance is where the most information is).
This is achieved by performing Eigen-decomposition on the Covariance matrix of the original data. Essentially you are learning the eigenvectors of the covariance matrix, ordered by their eigenvalues.