There is no such thing as meaningless data. Data are just observations. Meaning is constructed from the interpretation of data.
The primary questions of good science are what, where, when, and who. These are the questions you answer when you collect data. Once you've answered them you can address secondary questions of why and how. Asking why and how without giving priority to what, where, when, and who is putting the cart before the horse.
When you are unable to answer why and how for a given set of data, it is not meaningless. Rather, the lack of correlation or explanations just says that perhaps we need to look into this more deeply. "I've looked at the data and I don't know" is a profound statement, and it can be inspiring.
Science also has to be falsifiable, and effectively that's what these graphs do, at least as far as extrapolating from the NY study goes.
I agree it would have been more helpful if the author had presented conclusions about what the data mean or don't mean, but they aren't a priori meaningless simply because there isn't a visible correlation. No correlation, which is the rather obvious conclusion, is just as meaningful. I hope this is more clear.
interesting. the authors made the statement "The results show the Bay Area's economic inequality and its relationship with transit and urban form."
you are saying that a fair conclusion is "there is no clear relationship between transit and income in the bay area", which could be the conclusions the authors were drawing (i think it's not clear exactly what conclusion if any they drew).
i actually was not thinking about it correctly in retrospect. i saw the nyc graph and i thought "manhattan = rich, everywhere else = poor". i saw these graphs and i thought "don't see anything". i assumed the conclusion from the nyc graphs was superior, because there was a clear positive relationship there. you are saying that the "don't see anything" conclusion is just as valid, even though it is not a positive relationship.
Yeah! Well, another point comes up here too. Do we really need to look at subway lines to find out if rich people live in Manhattan? It seems like you only need income tax returns to answer that one. It's more like, are the stops on subway lines segregated into rich and poor clusters the same way that physical neighborhoods of rich and poor people are segregated on a map? And then speculation as to why or why not is interesting. For SF it's hard to answer, but for New York, well, to build a train track that leaves Manhattan is expensive unless you're going to the Bronx, so you're going to put all the Manhattan stops together.
Data that answers a question nobody wants answered is effectively meaningless. If you are showing a lack of correlation in a situation where a correlation might be expected then good job. If you are showing a lack of correlation between giraffe migration and cactus branch count then you're wasting everyone's time by bringing it up.
What do the transit vs. income graphs for SF look like?
How do the SF graphs compare to the NY ones?
Is there a clustering of rich and poor stops in SF like there is in NY?
And then finally, what are the possible explanations?
Sure, they didn't answer the last question, and you have to inspect the data to answer the second and third ones, but it's okay to provide data for other people to look at.
Surely if the first question was worth answering for NY, it's worth answering for SF. You don't answer questions simply because you expect to find something, you answer them because you're curious.
Do you know the story about Richard Feynman and the wobbling plate in the cafeteria? It's another question that "nobody wanted to answer".
The wobbling plate is an explanation, not a yes/no answer about correlation. If you can explain something then by definition there is causality in there somewhere.
Look, I'm wording myself badly today, let me try again. The question of 'is there a correlation' is worth answering but ONLY because a correlation is plausible. If you are graphing data to apply to a PLAUSIBLE hypothesis, then your work is reasonable. But if you are instead graphing random junk without any reason, you are wasting everyone's time. Data needs to cause some kind of mental connection in the viewer. That is a VERY low bar to meet. This article meets it. But not all theoretical articles do.
No conclusion is not necessarily a conclusion. Data doesn't have meaning but showing people data should have meaning.
Ok ok, the thing about the wobbling plate is that Feynman says the freedom to investigate a question that was only interesting to him was what led him to get the Nobel prize. Here's an excerpt from his book that's faster to process than the video I linked to:
But otherwise, I think we basically agree. Mostly I thought you were talking about the SF vs. NY thing and this article, not about inane investigations into arbitrary correlations (e.g., Is there a relationship between the number of steps someone takes per day and the number of spoons in their apartment? - well, actually, there probably is, especially if you start taking away knives and forks too). The most important thing I guess is to have a question that the researcher is interested in answering.
The primary questions of good science are what, where, when, and who. These are the questions you answer when you collect data. Once you've answered them you can address secondary questions of why and how. Asking why and how without giving priority to what, where, when, and who is putting the cart before the horse.
When you are unable to answer why and how for a given set of data, it is not meaningless. Rather, the lack of correlation or explanations just says that perhaps we need to look into this more deeply. "I've looked at the data and I don't know" is a profound statement, and it can be inspiring.
Science also has to be falsifiable, and effectively that's what these graphs do, at least as far as extrapolating from the NY study goes.
I agree it would have been more helpful if the author had presented conclusions about what the data mean or don't mean, but they aren't a priori meaningless simply because there isn't a visible correlation. No correlation, which is the rather obvious conclusion, is just as meaningful. I hope this is more clear.