]]>Understanding that correlation and causation aren't the same isn't sufficient. That kind of superficial "understanding" may itself be misleading. The problem is much deeper in a way not addressable by calls for statistical literacy. I agree that more literacy in statistics is a good thing.
But it doesn't guarantee real understanding. Even from people who are highly literate in the language.
Since causality is tied to the real world, I think that this is a question about the philosophy of (statistical) physics, and not mathematics or statistics.
But statistics is not only a branch of mathematics. It is also concerned with applying that branch of mathematics to the real world.
@fgdorais: While the OP used correlation in the sense of linear correlation, in the wider context of this discussion I think nonlinear regressions should be allowed as well. Anscombe's quartet is very interesting, but still mainly a statistical technicality. The deeper problem is, if careful statistical analysis reveals a pattern involving several variables that appears unlikely to have arisen by accident, what (if anything) can we conclude about causal relationship between the variables?
]]>Can you clarify what you mean by "this"? There are certainly some very important related scientific/ethical questions regarding the use of correlation, but if "this" means the original question as posed then no.
Another important question with more mathematical content is how to measure causation. As we saw in the answers, this is an important research area in statistics.
]]>Correlation can be used as evidence of a causal relation, but it is not proof of such a relation nor is it a consequence of such a relation. In fact, the relation between the two is very limited as far as I know.
"A causal relation often leads to statistical correlation." Only when the causal relation leads to a linear dependency and then correlation only attempts to measure how linear this dependency is. Correlation can be used to check whether X, Y follow a given relation Y = f(X) by correlating Y and f(X) but you need to know f in advance. I have used correlation to support a relation that I derived by other means, but I never used correlation to derive a relationship. I think this is one of the principal ways that correlation is used in experimental sciences.
"[...] faced with significant statistical correlation it is often important to understand the causal structure leading to it." Maybe, but you could be in for a surprise! It is prudent to analyze other aspects of the data before embarking on such a quest. See Anscombe's Quartet to see how it is important to look at the data closely before relying on correlation and other basic statistics. It's also sometimes misleading to think that way. For example, road traffic density is highly correlated with certain times of day but neither is a direct cause of the other. In fact, the "causal structure" between the two is such a complex web that I pity the (presumably alien) scientist who unknowingly embarks on a quest to untangle this.
]]>However this is unrelated to my original feelings about the question. Correlation and causality are not related at all, and it's easy to find plenty of information about that. I don't think asking about relating the two is a good question, but that doesn't warrant closure per se. The frustrating bit was the fact that the actual question is very hard to find, the "any comments?" at the end is not a real question. When unraveled, the question ends up being: "Do these three measures (which are only loosely related to correlation) imply causality?" This is when I originally rolled my eyes and posted my comment.
That said, I didn't vote to close even if there was evidence for the question being "off-topic," "subjective and argumentative," and "not a real question." I didn't find the evidence to be strong enough and I thought the OP could probably fix the question. I still think the question could be rescued, but I don't know that I could do it. In the end, I found the answers to the question very interesting to read, but Andrew Stacey's Standard Objection™ that good answers do not make good questions definitely applies to this case.
]]>My only objection is to the form of the question, not the content. I'm afraid the question cannot have lasting value in its current state.
In that case, I think the question should be edited. Could you either edit the question or indicate what about the form of the question needs to change? As I understand it, the objection is to the focus on the misguided question "when does correlation imply causation?" Instead, it should be pushing for something like "what, in addition to correlation, is needed to determine causation?" What exactly is the better question to ask?
]]>In other words, MO needs an attached bar.
Spoken like a true viking!
]]>I think I will abstain from voting to close any question on statistics from now on, regardless.
I think this is a bad decision.
Maybe, but it's not cast in stone. I may change my mind about it eventually, but for the time being, it stands.
]]>It was the presense of those answers that made me uncomfortable enough with my decision to vote for closure to express those doubts in a comment. I still feel that it was the right decision, but in deference to Tom LaGatta's comment above, I think I will abstain from voting to close any question on statistics from now on, regardless.
Harald, whether I agree with your reason to close or not, I want to thank you for expressing your opinion in a comment. As Anton points out again and again, it's important for us to communicate the nuances of a particular question to each other, and not just resort to a blunt up-or-down vote.
]]>But in all seriousness, is it kosher to say that MO is not really a place for statisticians? I mean, this would be a lot easier if we could just close all statistics questions as off-topic and lead them to greener pastures. I think it's reasonable, and I think that's something that the administration should seriously consider.
That seems a bit random. I don't think statistics questions are being enough of a nuisance that this makes sense. We'd just end up with some good, interesting probability questions getting closed (and causing a big stink) because a handful of 3k+ rep users heard that "it's MO policy to close statistics questions". There are a whole bunch of questions tagged [st.statistics]. Most of them look fine to me (based on a poor sample I just made), and lots of them have accepted answers. I don't think these questions are bothering anybody.
]]>I think I will abstain from voting to close any question on statistics from now on, regardless.
I think this is a bad decision.
Oh yeah, well I think it's an even worse decision!
But in all seriousness, is it kosher to say that MO is not really a place for statisticians? I mean, this would be a lot easier if we could just close all statistics questions as off-topic and lead them to greener pastures. I think it's reasonable, and I think that's something that the administration should seriously consider.
]]>I find it hard to understad why people regard the question as borderline or even want to close it. But I would be happy to understand this line of thought.
And I find it difficult to understand that this is so hard to understand. At least I, from my side of the discussion, can see why others disagree. If I can't even explain myself well enough that you can understand my point of view, then either my powers of explanations are feeble indeed, or else our respective points of view are separated by a chasm too deep and wide to be easily bridged. Maybe some of the comments added after you wrote the above will help explain it; if not, then I think the only way forward would be to spend an evening over a beer or three hashing it out. In other words, MO needs an attached bar. Until then, I'll stick with my resolution not to vote on statistics questions.
]]>I think I will abstain from voting to close any question on statistics from now on, regardless.
I think this is a bad decision.
We can hope there are enough 3000+ rep statisticians here so they can deal with the really bad questions themselves.
That will never happen.
]]>If the question had not already gotten some (good) answers by the time I saw it, I would just have voted to close and moved on. It was the presense of those answers that made me uncomfortable enough with my decision to vote for closure to express those doubts in a comment. I still feel that it was the right decision, but in deference to Tom LaGatta's comment above, I think I will abstain from voting to close any question on statistics from now on, regardless. We can hope there are enough 3000+ rep statisticians here so they can deal with the really bad questions themselves.
]]>There is nothing like the greatest communication setup in the history of humankind, based on several of the greatest theoretical and technological achievements of the human mind to allow for such silly mistakes!
]]>I've also voted to close. My reason is my standard one: great answers do not make great questions.
]]>If someone is an expert in an underrepresented area on MO and doesn't have enough points to vote to close, all that person should need to do is leave a comment explaining to us that he or she is an expert in the subject and why he or she thinks that the question should be closed (or maybe post it on meta like Tom Smith just did).
A lot of us can't tell the difference between a good or bad questions in fields in which we have no experience, so we rely on these sorts of comments to avoid closing questions that might be good. (Obviously this isn't necessary for something like algebraic geometry, algebraic topology, commutative algebra, etc., for obvious reasons).
What do you guys think?
]]>A related phenomenon which I have noticed is that there seem to be relatively few active statisticians on MO, so that some simple stats questions which should be quickly closed slip through the net
This is because there have been incidents concerning stats questions that have been voted down, and the people who voted them down were accused of being biased against applied math.
]]>A different issue is that I think questions should be closed relatively quickly or not at all. Maybe votes to close should go stale after a while, to diminish the ratcheting effect a little. In particular, this question has been around far too long that it should be closed now.
Disclaimer: The above is written by a sleep-addled brain with a caffeine deficiency. Handle with caution. Void where prohibited.
]]>If you really think the question has no merit, please give the statisticians and probabilists in the community a little more time to contribute before you make the move to close it. It's better to err by leaving a bad question open then to close a potentially good one.
I don't have a good argument for closing the question, but I will argue against this reason for not closing the question. Firstly, it is possible to recognize a bad question outside your area of expertise. Secondly, just because it's possible to give a good answer to a bad question does not mean that the question should be left open. If a question is vague enough that somebody could post a brilliant answer which might not answer the question the asker actually had, then it's probably vague enough that it should be closed until it is clarified. (See also this post by Andrew Stacey) Closing a question is not permanent: questions can be reopened. In fact, if somebody with 3000+ reputation (or the question owner) puts in the effort to really fix up a closed question, she should probably flag it for moderator attention so we can reopen it without delay.
I'm not sure if this question is so vague that it should be closed. If you want to close the question because it's vague, it's a good exercise to come up with two pretty different interpretations for what the asker is really getting at. If you can't, then the question is probably fine, but is just sloppy and needs some editing (you should go ahead and edit it!). If you can, then you should leave a comment saying that you're voting to close because the question is ambiguous and indicating what the ambiguity is. If the ambiguity is resolved, vote to reopen.
]]>"I am voting to close, even though I am not a statistician and the two answers so far seem to indicate that it is possible to say something of a mathematical nature on the subject."
Harald, you said it yourself that you're not a statistician. If you really think the question has no merit, please give the statisticians and probabilists in the community a little more time to contribute before you make the move to close it. It's better to err by leaving a bad question open then to close a potentially good one.
]]>I think this is a good example of a slightly misdirected question. I think everyone would be happy if the title was "How can you measure causality?" and the question started with "We all know correlation does not imply causation, but..."
See 22635 for an example of a well directed question on a similarly often misunderstood fact of life.
]]>The comment thread currently reads
]]>I am voting to close, even though I am not a statistician and the two answers so far seem to indicate that it is possible to say something of a mathematical nature on the subject. However, in the end and answer to the question as stated cannot be mathematical in nature. As always, if you disagree, explain why here, so closure can perhaps be staved off (and if the discussion turns long, move it to meta). – Harald Hanche-Olsen
I vote against closing, even though the question could have been phrased somewhat more precisely. The importance of our understanding how to perform causal inference cannot be underestimated. Any improvement in our current understanding of this matter could be used to combat many kinds of ills in the world, such as diseases and crime. A very important general question is how to begin with a large amount of observational data about, say, how disease X is caused -- say 10,000 people's health status and auxiliary variables -- and extract the most likely guesses as to the etiology of X. – Daniel Asimov
As my comment below suggests, I too "vote against closing", as it were -- Shiva Kaul's answer suggests that something useful can be said (beyond the standard Stats 99 responses, which is about all I'd be able to contribute myself). – Yemon Choi