By "total least squares" you mean to pick regression coefficients that minimize ...

tomrod · on March 31, 2019

I inspired a great soliloquy! Thanks for the thoughts

My point: total least squares includes X in the error minimization, not just Y and linear combination of X. There is a good introductory discussion on wiki -- essentially in standard regression we typically assume no measurement error in the independent variables.[0]

As much value as machine learning brings, there is a need for explaining as much as there is for predicting![1]

Your point on whether there is "true control" seems to agree with Pearl's main point of contention -- does the causality plot (which is testable) make sense from a theoretical, experiential, or systemic sense?

[0] https://en.wikipedia.org/wiki/Total_least_squares#/media/Fil...

[1] https://www.stat.berkeley.edu/~aldous/157/Papers/shmueli.pdf

graycat · on March 31, 2019

Okay, "total least squares" as in your [0]!!!! WOW!!! Back when I knew nothing of regression or curve fitting and was first considering the issue, the question I asked was, if we are trying to fit to the given data, why not have the line as close as possible to each of the points on the scatter diagram, just as in the quite good picture at [0]!! Gee, value of ignorance!!

Again I believe we are trying to make too much out of regression.

Or, maybe, if somehow we DO have causes, we really know they are the real causes, and we have some data, good data, and the data likely satisfies the usual assumptions as in the reference I gave to Rao, THEN, maybe, on a good day, with luck, do the regression calculations, t-tests, the F-ratio, get the confidence intervals on the coefficients and the predicted values, etc. and if all that looks solid, then take it seriously.

Here, however, KNOW the independent variables, all of them, KNOW that they are the causes, and don't need controls, etc. and are not fishing for the variables, we are not trying to have statistics tell us about causes, ..., then maybe okay.

But, sure, if there really are causes and if we really do have variables that do well measuring those causes, then maybe in the regression the variables that are candidates as causes will become fairly obvious.

tomrod · on March 31, 2019

Regression is useful because it allows us to interpolate within observed populations using relatively light assumptions. Extrapolation requires higher order theories and structure. Agreed that it can be a logical mess when one uses it bluntly, but like all tools it has its uses and misuses.

graycat · on March 31, 2019

Ah, cruel, you are so cruel, how could you be so cruel; the OP was hoping for something so much better than just some interpolation!!!

Cruel or not, at least in practice, you are on solid ground.

tomrod · on April 1, 2019

Then you may be interested in pursuing structural models!

Thanks for the chat.

srean · on March 31, 2019

Total least squares is pretty bog standard in statistics and has a lot of literature, including monographs and text books.

You are correct about the Pythagoras theorem and by virtue of that TLS has close connections with PCA, in fact, once you have the PCA model you can derive the TLS coefficients from the PCA parameters.

The tricky bit is that in TLS the number of nuisance parameters grows with the data so it wouldn't be immediately obvious that estimates would converge. It turns out that it does.