Not signed in (Sign In)

Vanilla 1.1.9 is a product of Lussumo. More Information: Documentation, Community Support.

  1.  

    http://mathoverflow.net/questions/22490/correlation-and-causation-when-can-we-believe-correlation-reasonably-at-least

    The comment thread currently reads

    I am voting to close, even though I am not a statistician and the two answers so far seem to indicate that it is possible to say something of a mathematical nature on the subject. However, in the end and answer to the question as stated cannot be mathematical in nature. As always, if you disagree, explain why here, so closure can perhaps be staved off (and if the discussion turns long, move it to meta). – Harald Hanche-Olsen

    I vote against closing, even though the question could have been phrased somewhat more precisely. The importance of our understanding how to perform causal inference cannot be underestimated. Any improvement in our current understanding of this matter could be used to combat many kinds of ills in the world, such as diseases and crime. A very important general question is how to begin with a large amount of observational data about, say, how disease X is caused -- say 10,000 people's health status and auxiliary variables -- and extract the most likely guesses as to the etiology of X. – Daniel Asimov

    As my comment below suggests, I too "vote against closing", as it were -- Shiva Kaul's answer suggests that something useful can be said (beyond the standard Stats 99 responses, which is about all I'd be able to contribute myself). – Yemon Choi

    • CommentAuthorYemon Choi
    • CommentTimeApr 26th 2010
     

    I agree it might be borderline, but it doesn't offend me in the way some others (where I've not voted to close!) annoy me. The answer referring to Pearl's work suggests that although the original question might have been too naive or not thought through, a useful MO "thread" has resulted. It's not clear to me why this should be closed ahead of others.

  2.  

    I'm abstaining from voting on this question because of the poster's reaction to my comment, but I will share my opinion here.

    I think this is a good example of a slightly misdirected question. I think everyone would be happy if the title was "How can you measure causality?" and the question started with "We all know correlation does not imply causation, but..."

    See 22635 for an example of a well directed question on a similarly often misunderstood fact of life.

  3.  
    I agree with HHO's take.
  4.  

    Harald wrote:

    "I am voting to close, even though I am not a statistician and the two answers so far seem to indicate that it is possible to say something of a mathematical nature on the subject."

    Harald, you said it yourself that you're not a statistician. If you really think the question has no merit, please give the statisticians and probabilists in the community a little more time to contribute before you make the move to close it. It's better to err by leaving a bad question open then to close a potentially good one.

  5.  

    If you really think the question has no merit, please give the statisticians and probabilists in the community a little more time to contribute before you make the move to close it. It's better to err by leaving a bad question open then to close a potentially good one.

    I don't have a good argument for closing the question, but I will argue against this reason for not closing the question. Firstly, it is possible to recognize a bad question outside your area of expertise. Secondly, just because it's possible to give a good answer to a bad question does not mean that the question should be left open. If a question is vague enough that somebody could post a brilliant answer which might not answer the question the asker actually had, then it's probably vague enough that it should be closed until it is clarified. (See also this post by Andrew Stacey) Closing a question is not permanent: questions can be reopened. In fact, if somebody with 3000+ reputation (or the question owner) puts in the effort to really fix up a closed question, she should probably flag it for moderator attention so we can reopen it without delay.

    I'm not sure if this question is so vague that it should be closed. If you want to close the question because it's vague, it's a good exercise to come up with two pretty different interpretations for what the asker is really getting at. If you can't, then the question is probably fine, but is just sloppy and needs some editing (you should go ahead and edit it!). If you can, then you should leave a comment saying that you're voting to close because the question is ambiguous and indicating what the ambiguity is. If the ambiguity is resolved, vote to reopen.

  6.  

    Sorry, I am traveling and don't have time to participate in this discussion right now. But to comment quickly: I may not be a statistician, but I have been hanging out in the skeptic movement long enough to know very well that no amount of correlation proves causation in the real world. That some of the rather interesting answers show that correlation does indeed imply correlation under narrowly defined circumstances, does not change this fact. So in a sense, I don't think those interesting answers did answer the question as it was asked. It was the presence of those answers that made me doubt my decision to vote close sufficiently to write the comment I did. But as asked, the question seemed to call for endless arguments about what causation is and how we can tell it's there.

    A different issue is that I think questions should be closed relatively quickly or not at all. Maybe votes to close should go stale after a while, to diminish the ratcheting effect a little. In particular, this question has been around far too long that it should be closed now.

    Disclaimer: The above is written by a sleep-addled brain with a caffeine deficiency. Handle with caution. Void where prohibited.

  7.  

    Votes to close vanish after a week.

    • CommentAuthorTom Smith
    • CommentTimeApr 27th 2010
     
    I agree with those who think this question has just enough content to survive.

    A related phenomenon which I have noticed is that there seem to be relatively few active statisticians on MO, so that some simple stats questions which should be quickly closed slip through the net:

    http://mathoverflow.net/questions/21813/which-statistical-test-is-significant-here
    http://mathoverflow.net/questions/21821/how-to-test-whether-a-data-is-or-not-t4-distribution
  8.  

    I think statisticians are unlikely to hang out on MathOverflow, for the same reason that I wouldn't go to a SE site called "StDev" or "MarginalError," but I would definitely like to refer these questions to such a site. If anyone here knows web savvy statisticians, they should encourage them to setup such a site.

  9.  
    Tom--I've just voted to close both of these.
  10.  

    A related phenomenon which I have noticed is that there seem to be relatively few active statisticians on MO, so that some simple stats questions which should be quickly closed slip through the net

    This is because there have been incidents concerning stats questions that have been voted down, and the people who voted them down were accused of being biased against applied math.

  11.  
    @Harry: Although that may be a contributing factor, I think fgdorais is closer to the mark. Statistics is a distinct field and culture from mathematics, and I doubt many statisticians (as opposed to students taking statistics classes, or people from other fields who aren't aware of the difference) have ever come here in the first place.
    • CommentAuthorHarry Gindi
    • CommentTimeApr 27th 2010 edited
     

    No, I mean that people here are reluctant to close/vote down statistics questions because of this incident (which I think was one of the first incidents where somebody got angry over people closing a post).

    • CommentAuthorYemon Choi
    • CommentTimeApr 27th 2010
     

    Have just cast closing votes on both questions mentioned by Tom. I remember noticing both at the time, but deciding to wait in case someone could turn up in comments with specific suggestions of where else to look. (I didn't want to leave a comment which gave an impression of "we look down on statistics" as opposed to "most of us here don't really do statistics at any practising level". Thankfully, both of Tom's comments seemed to me to avert that particular misunderstanding.

  12.  
    Harry: sorry I misunderstood. In that case, yes, I'm sure you're right.
  13.  

    I have a suggestion:

    If someone is an expert in an underrepresented area on MO and doesn't have enough points to vote to close, all that person should need to do is leave a comment explaining to us that he or she is an expert in the subject and why he or she thinks that the question should be closed (or maybe post it on meta like Tom Smith just did).

    A lot of us can't tell the difference between a good or bad questions in fields in which we have no experience, so we rely on these sorts of comments to avoid closing questions that might be good. (Obviously this isn't necessary for something like algebraic geometry, algebraic topology, commutative algebra, etc., for obvious reasons).

    What do you guys think?

    • CommentAuthorgilkalai
    • CommentTimeApr 29th 2010
     
    Let me expand a comment from the comment thread. The question regarding correlations and causality is a very important question in applied mathematics/statistics which is important also all over science. People indeed tried to suggest various mathematical criteria for causation. Like many other cases where mathematics is applied there is also controversy about various suggested criteria. There are many papers about it. I will certainly be very happy to learn more about the question from experts.

    I find it hard to understad why people regard the question as borderline or even want to close it. But I would be happy to understand this line of thought.
  14.  

    @fgdorais: I think that the poster's reaction to your comment stems from a misreading of the fourth word in your comment. The font makes it hard to see exactly what that word is so it looks a bit like "LITTLE" instead of the "TITLE" that it actually is (I've left a comment to this effect).

    I've also voted to close. My reason is my standard one: great answers do not make great questions.

    • CommentAuthorMariano
    • CommentTimeApr 29th 2010 edited
     

    Heh.

    There is nothing like the greatest communication setup in the history of humankind, based on several of the greatest theoretical and technological achievements of the human mind to allow for such silly mistakes!

  15.  

    @Andrew: Thanks! I was wondering where that "little question" was coming from.

  16.  

    @Gil: It's not necessarily the content of the question which is at fault, but rather the form of the question. See my earlier comment to that effect.

  17.  

    I think both fgdorais and Andrew nailed it in their comments. If anything, the (excellent) answers to the question only confirm my gut feeling that the question is not a good one: If I read them correctly, they all agree that correlation on its own never implies causation. To show causation, you need not only statistics but knowledge (expertise, even) within the relevant domain, so that you can, for instance, with a reasonable degree of confidence verify the conditions of Shiva Kaul's answer. To top it off, the “necessary conditions” of the question seem wrong-headed to me. As an example, we now know that smoking causes cancer. This conclusion is of course based on a lot of data which has been subjected to a lot of statistical analysis. But it is also based on detailed knowledge of the underlying mechanism, plus of course looking for and accounting for the effect of every conceivable confounding variable. (Besides, the cancer comes a long time after the smoking, which effectively demolishes the third “necessary condition” of the question.)

    If the question had not already gotten some (good) answers by the time I saw it, I would just have voted to close and moved on. It was the presense of those answers that made me uncomfortable enough with my decision to vote for closure to express those doubts in a comment. I still feel that it was the right decision, but in deference to Tom LaGatta's comment above, I think I will abstain from voting to close any question on statistics from now on, regardless. We can hope there are enough 3000+ rep statisticians here so they can deal with the really bad questions themselves.

  18.  

    I think I will abstain from voting to close any question on statistics from now on, regardless.

    I think this is a bad decision.

    We can hope there are enough 3000+ rep statisticians here so they can deal with the really bad questions themselves.

    That will never happen.

  19.  

    @gilkalai:

    I find it hard to understad why people regard the question as borderline or even want to close it. But I would be happy to understand this line of thought.

    And I find it difficult to understand that this is so hard to understand. At least I, from my side of the discussion, can see why others disagree. If I can't even explain myself well enough that you can understand my point of view, then either my powers of explanations are feeble indeed, or else our respective points of view are separated by a chasm too deep and wide to be easily bridged. Maybe some of the comments added after you wrote the above will help explain it; if not, then I think the only way forward would be to spend an evening over a beer or three hashing it out. In other words, MO needs an attached bar. Until then, I'll stick with my resolution not to vote on statistics questions.

    • CommentAuthorHarry Gindi
    • CommentTimeApr 29th 2010 edited
     

    fgdorais said:

    I think I will abstain from voting to close any question on statistics from now on, regardless.

    I think this is a bad decision.

    Oh yeah, well I think it's an even worse decision!

    But in all seriousness, is it kosher to say that MO is not really a place for statisticians? I mean, this would be a lot easier if we could just close all statistics questions as off-topic and lead them to greener pastures. I think it's reasonable, and I think that's something that the administration should seriously consider.

  20.  

    But in all seriousness, is it kosher to say that MO is not really a place for statisticians? I mean, this would be a lot easier if we could just close all statistics questions as off-topic and lead them to greener pastures. I think it's reasonable, and I think that's something that the administration should seriously consider.

    That seems a bit random. I don't think statistics questions are being enough of a nuisance that this makes sense. We'd just end up with some good, interesting probability questions getting closed (and causing a big stink) because a handful of 3k+ rep users heard that "it's MO policy to close statistics questions". There are a whole bunch of questions tagged [st.statistics]. Most of them look fine to me (based on a poor sample I just made), and lots of them have accepted answers. I don't think these questions are bothering anybody.

  21.  

    Harald said:

    It was the presense of those answers that made me uncomfortable enough with my decision to vote for closure to express those doubts in a comment. I still feel that it was the right decision, but in deference to Tom LaGatta's comment above, I think I will abstain from voting to close any question on statistics from now on, regardless.

    Harald, whether I agree with your reason to close or not, I want to thank you for expressing your opinion in a comment. As Anton points out again and again, it's important for us to communicate the nuances of a particular question to each other, and not just resort to a blunt up-or-down vote.

  22.  

    @fgdorais:

    I think I will abstain from voting to close any question on statistics from now on, regardless.

    I think this is a bad decision.

    Maybe, but it's not cast in stone. I may change my mind about it eventually, but for the time being, it stands.

  23.  

    In other words, MO needs an attached bar.

    Spoken like a true viking!

    • CommentAuthorgilkalai
    • CommentTimeApr 30th 2010 edited
     
    I agree with Tom that the remarks by Harald, Francios, Harry and others are very useful and very much welcome.

    In thie case, I found the title and the formulation of the problem
    "We always hear , when reading on correlation, that 'Correlation does not imply causation.'
    Still, I have never seen any source that tries to answer the question of when can we reasonably conclude a causal relation between variables X,Y,from a correlation.",
    very reasonable and also the three preliminary thoughts added to the question rather nice.

    The problem is of importance in various areas of applied mathematice and theoretical computer science, and also in probability theory and mathematical physics in addition to its importance in statistics. Also there were many papers written about it so answers pointing to various good sources will be quite useful.

    I can see that "overlly important" questions with some philosophical aspects are, perhaps, not the most useful MO questions, but still there are many of those.

    Part of Francois's and Harald's objection is that they simply suggest the answer "no". Francois wrote "The answer to your title question is never" and many people felt that this is an excellent response. Harald expressed a similar view and I can certainly identify with Harald's points regarding the importance of understanding specific mechanism in every particular case which is usually not a mathematical task. (And the reference to the skeptic movement in this context is interesting.) Let me mention that there are people that will not only agree that correlation does not imply causality but will question if "causality" is a meaningfull notion at all.

    But I do not agree that the only contribution mathematics can make to the question if and when we can reasonably conclude causal relation from correlation is the answers "no" and "never". (Certainly there were attempts to say more which are worth looking at.)
  24.  

    Gil, my comment was much more nuanced than just "never". It's a bit terse, but it was meant to suggest that the title was misleading (in fact, plain wrong) and that there is a better question to ask. The better question is the one you're interested in. My only objection is to the form of the question, not the content. I'm afraid the question cannot have lasting value in its current state.

  25.  
    @gilkalai: Count me among the causality-questioners :)

    (I think "causality" is of a similar status to "rationality" - or, perhaps even more controversially, "free will" - but I digress).
  26.  

    I see the votes to close have started timing out. Yesterday the count stood at 4, today it's 2.

  27.  

    My only objection is to the form of the question, not the content. I'm afraid the question cannot have lasting value in its current state.

    In that case, I think the question should be edited. Could you either edit the question or indicate what about the form of the question needs to change? As I understand it, the objection is to the focus on the misguided question "when does correlation imply causation?" Instead, it should be pushing for something like "what, in addition to correlation, is needed to determine causation?" What exactly is the better question to ask?

  28.  

    Anton, I don't know how to ask the better question. I suggested "How can you measure causality?" way back, your suggestion is fine too. Maybe Gil has a good idea? If the question stays open I think someone should at least change the title to something relevant to the actual question. (And fix the typos.)

    • CommentAuthorgilkalai
    • CommentTimeApr 30th 2010 edited
     
    Guys, as i said, in my opinion the title of the question "Correlation and causation: When can we believe correlation (reasonably, at least) imply causation"
    is very nice (appart, perhaps from the English). I do not think the question "when does correlation imply causation?" is misguided. It is a great question. When you say "when" you usually mean in mathematics "under which additional conditions" and this is precisely what the question is about.
  29.  

    We're reading the current title completely differently. I might have read it your way if "when" was replaced by "under which additional conditions," but I'm not sure. Harald's objection is that causality is not a property of random variables, so there are no such additional conditions when the implied subject is "random variables" instead of whatever else it should be.

    However this is unrelated to my original feelings about the question. Correlation and causality are not related at all, and it's easy to find plenty of information about that. I don't think asking about relating the two is a good question, but that doesn't warrant closure per se. The frustrating bit was the fact that the actual question is very hard to find, the "any comments?" at the end is not a real question. When unraveled, the question ends up being: "Do these three measures (which are only loosely related to correlation) imply causality?" This is when I originally rolled my eyes and posted my comment.

    That said, I didn't vote to close even if there was evidence for the question being "off-topic," "subjective and argumentative," and "not a real question." I didn't find the evidence to be strong enough and I thought the OP could probably fix the question. I still think the question could be rescued, but I don't know that I could do it. In the end, I found the answers to the question very interesting to read, but Andrew Stacey's Standard Objection™ that good answers do not make good questions definitely applies to this case.

  30.  

    (I think that my citation index for that particular comment is higher than for any of my papers! Given that the probability of having a theorem or "mathematical thingy" named after me is fairly small, I'm quite chuffed at having a "Standard Objection" in the meantime.)

    • CommentAuthorgilkalai
    • CommentTimeMay 1st 2010
     
    Dear François,
    you wrote: "Correlation and causality are not related at all, and it's easy to find plenty of information about that."
    I beg to disagree. Correlation and causality are very related issues. A causal relation often leads to statistical correlation, and faced with significant statistical correlation it is often important to understand the causal structure leading to it.
  31.  

    Dear Gil,

    Correlation can be used as evidence of a causal relation, but it is not proof of such a relation nor is it a consequence of such a relation. In fact, the relation between the two is very limited as far as I know.

    "A causal relation often leads to statistical correlation." Only when the causal relation leads to a linear dependency and then correlation only attempts to measure how linear this dependency is. Correlation can be used to check whether X, Y follow a given relation Y = f(X) by correlating Y and f(X) but you need to know f in advance. I have used correlation to support a relation that I derived by other means, but I never used correlation to derive a relationship. I think this is one of the principal ways that correlation is used in experimental sciences.

    "[...] faced with significant statistical correlation it is often important to understand the causal structure leading to it." Maybe, but you could be in for a surprise! It is prudent to analyze other aspects of the data before embarking on such a quest. See Anscombe's Quartet to see how it is important to look at the data closely before relying on correlation and other basic statistics. It's also sometimes misleading to think that way. For example, road traffic density is highly correlated with certain times of day but neither is a direct cause of the other. In fact, the "causal structure" between the two is such a complex web that I pity the (presumably alien) scientist who unknowingly embarks on a quest to untangle this.

  32.  

    Am I the only one who agrees with Gil that this is an extremely important question, but thinks that it is a scientific, not a mathematical one? I don't particularly care about closing the question, but as a philosophical matter, i think this question illustrates perfectly the difference between math and science (which, of course, some will dispute exists).

  33.  

    @Ben:

    Can you clarify what you mean by "this"? There are certainly some very important related scientific/ethical questions regarding the use of correlation, but if "this" means the original question as posed then no.

    Another important question with more mathematical content is how to measure causation. As we saw in the answers, this is an important research area in statistics.

  34.  
    Since causality is tied to the real world, I think that this is a question about the philosophy of (statistical) physics, and not mathematics or statistics.

    Preparing two macroscopically identical experiments is not the same as preparing two microscopically identical initial states. Since physics is the basic theory of natural phenomena, the sheer fact that we can repeat experiments in science at all is an endorsement of the validity of statistical physics, and in particular the ensemble approach. As Jaynes put it: "The thing which has to be explained is, not that ensemble averages are equal to time averages; but the much stronger statement that ensemble averages are equal to experimental values." One might even go so far as to argue that experimental physics and statistical physics are one and the same, and that theoretical statistical physics is really a “theory of experiments”.

    But I would not say that any of this is mathematics or statistics.
  35.  
    And BTW, if we're talking about causality in (say) economics etc. then I'd just argue that economics consists of effective(?!) theories that are presumed to have some basis in progressively more exact sciences, and ultimately in physics.
  36.  

    @Steve:

    Since causality is tied to the real world, I think that this is a question about the philosophy of (statistical) physics, and not mathematics or statistics.

    But statistics is not only a branch of mathematics. It is also concerned with applying that branch of mathematics to the real world.

    @fgdorais: While the OP used correlation in the sense of linear correlation, in the wider context of this discussion I think nonlinear regressions should be allowed as well. Anscombe's quartet is very interesting, but still mainly a statistical technicality. The deeper problem is, if careful statistical analysis reveals a pattern involving several variables that appears unlikely to have arisen by accident, what (if anything) can we conclude about causal relationship between the variables?

  37.  

    Ah! I missed the part where we stopped talking about the actual question at hand. I think we all agree that there is a better question to ask. Judging from the answers that this question got, I'm sure that the better question will get some excellent answers. I think that it would be more productive for the interested parties to go and ask the better question on MO, so we can refocus this meta thread on whether or not question 22490 is appropriate.

  38.  

    We aren't the only ones discussing this question. Here is Paul Revere:

    Understanding that correlation and causation aren't the same isn't sufficient. That kind of superficial "understanding" may itself be misleading. The problem is much deeper in a way not addressable by calls for statistical literacy. I agree that more literacy in statistics is a good thing.

    But it doesn't guarantee real understanding. Even from people who are highly literate in the language.

  39.  

    That's a great find, Harald. Thanks!

    • CommentAuthorgilkalai
    • CommentTimeMay 2nd 2010
     
    Dear Ben,

    "i think this question illustrates perfectly the difference between math and science (which, of course, some will dispute exists)."

    I think the question belongs to applied mathematics so it illustrates an important relation between mathematics and real world applications. Maybe it also demonstrates the difficulty of trained "pure" mathematicians to deal with applications.

    " Am I the only one who agrees with Gil that this is an extremely important question, but thinks that it is a scientific, not a mathematical one? "

    I would consider such a sharp divide between "scientific" and "mathematical" as quite extreme, even if one does not regard mathematics as science.
    It is an interesting question for discussion (probably not for MO) if mathematics is a science and I personally tend to think that mathematics is a science. There was a nice related discussion at the secret blogging seminar http://sbseminar.wordpress.com/2008/06/14/what-is-purity/ and an attempted discussion on my blog http://gilkalai.wordpress.com/2008/05/25/is-mathematics-a-science/ .