Vanilla 1.1.9 is a product of Lussumo. More Information: Documentation, Community Support.
I've been enjoying looking at the MO usage statistics posted on meta. They give loads of interesting info but, like all such statistics, they raise more questions than answers. The scientific response is to design experiments in order to test hypothetical answer these questions. For example, one such experiment was proposed (half seriously) by David Speyer in the competitiveness thread. This makes me wonder... Is it too early to experiment with MO? What would be reasonable experiments for MO?
I think more and more experiments should be conducted.
"More and more experiments"? I honestly have no idea about what you have in mind!
Oh of course there are lots. I refrained from giving examples because I didn't want to appear cheeky.
Now since you want one, here is an example. I still believe reputation hunting is the reason for a lot of evils.. Reputation can be made less prominent. For a couple of weeks, one could hide it from everywhere except the user page and see what is the public reaction.
Realistically I do not expect this to be implemented given how people are fond of reputation as shown in the thread mentioned by the OP.
Let's clarify the meaning of the word "experiment". The only indication fgdorais gave of what an experiment might be is the one David Speyer suggested. The one suggested by Regenbogen is one that I would have to do. Either one of these should be worked out more carefully before somebody actually tries them. Remember that the purpose of an experiment is to find something out. Putting a frog in the microwave is not an experiment, it's just animal cruelty.
If we're going to think about running any sort of experiment, we have to be able to answer the following questions:
The experiment suggested by David Speyer was aimed at determining how much the reputation of the poster determines how much people vote on the post. I see two problems with the experimental procedure. First, I think the name (i.e. real life reputation) makes more of a difference than points, and the experiment doesn't separate these factors. Second, you'd need a pretty dedicated experimenter because there's a lot of variation in how much the posts of any person get voted on; it will take a lot of data to say something statistically significant.
As for the experiment Regenbogen suggested, I have no idea how to answer question (2), and I'm not even sure I can answer question (1).
I should also point out that the best way to get answers to most questions is to simply look at the large amount of data we've already collected. We should only undertake the task of performing an experiment if there's something we really want to know the answer to that we don't have any other way of getting data about.
Ah! Perhaps there is something in the database dump; one can perhaps determine right away if and how voting patterns correspond to reputation. I am however incapable of operating databases and so such an attempt would be beyond me.
Thanks for clarifying Anton, you are right on the money about the second part of my question. I think giving "David's experiment" as an example gave the wrong direction to my question, so I'll rephrase...
Is it is too early to ask questions about MO usage? MO is still evolving quickly but at a much slower pace than a few months ago (some graphs here). It seems like a good idea to try to understand the raw data better, but it might still be too early to ask sensible questions.
Some of these questions might not be testable with reasonable experiments but some might. What would be interesting experiments that could be carried out? I do mean sensible and well designed experiments. (We are mathematicians, this is well within our collective skill set...) There are plenty of simple experiments that could be done with existing data or by collecting new data to address the question through polls and surveys, for example.
Here are some examples of things I find of interest...
Response time. How are the time to first answer, time to first twice upvoted answer, time to accepted answer distributed? (Accepted answer can be done with existing data; I don't know how I would do the other two.)
User retention. How many users visit MO for a second time? How many stay active for a week, a month? (For the second, last_access_date - creation_date should be accurate enough; I'm not sure how to tackle the first.)
How is the question/answer ratio distributed among users? (Doesn't seem to be accessible through the dump data, but the relevant population size is small.)
User perception survey. (That would help answer some open questions from the competition thread. Would take some time and expertise to design.)
Harvesting collected data and doing data mining is OK as long as it is done in accordance with whatever privacy policy MO has, but I feel uncomfortable just letting anyone doing the data mining.
(As an aside, this is one loss of privacy that I perceive as being a registered user. Someone will record my activity and use the data for purposes of which I do not approve. This is part of an answer to a question someone asked some time ago, about why I chose not to register. Never mind that there may be a similar loss with unregistered users as well.)
The public dump only contains information you would in principle be able to gather by browsing the site (almost ... it would be pretty hard to extract the dates of each vote by browsing the site, and those dates (but not exact times) are in the public dump). I have no intention of showing the full database dump to anybody unless there is a very good reason for doing so. Not even MO moderators have access to the full database dump. If somebody can convince me to do so (on a case by case basis), I'm willing to extract and make public some aggregate statistics from the full database dump which are impossible to extract from the public dump.
In any case, there isn't really any difference in privacy between registered and unregistered users. The activities of unregistered users is also tracked and included in the database dump (including the public dump). Information about how users browse the site (as opposed to how they vote or post) is not included, even in the full dump. The only way I can get at those data is with Google analytics. Again, this doesn't distinguish between registered and unregistered users.
It just occurred to me that there is another type of experiment: changing the web site to improve community usage. As long as that is the intent of the experiment (and that any thing messed up can be undone with no harm), then I encourage gradual experimentation by the site administrators.
Yes, that's pretty much the only sort of actual experiment I would consider performing. Almost anything else I want to know, I can get from the data dump (i.e. I can get by observing).
I think the votes-on-questions graph and the votes-on-answers graph would be very different. In particular, the sinks in the latter graph would be very different from the sinks in the former graph.
Response time. How are the time to first answer, time to first twice upvoted answer, time to accepted answer distributed? (Accepted answer can be done with existing data; I don't know how I would do the other two.)
The public dump contains exact times at which answers were posted, but only dates of votes (to prevent some kind of vote time correlation approach to guessing who cast what votes). So time to first answer is easy to extract, but time to first twice upvoted answer would have to be done by me.
User retention. How many users visit MO for a second time? How many stay active for a week, a month? (For the second, last_access_date - creation_date should be accurate enough; I'm not sure how to tackle the first.)
According to Google analytics, 28% of visits in the last month were the visitor's first visit. I'm not sure exactly how to interpret that. Users who visit the site multiple times contribute more visits, so the percentage of visitors who only visited the site once in the last month must be much higher than 28%.
How is the question/answer ratio distributed among users? (Doesn't seem to be accessible through the dump data, but the relevant population size is small.)
This information is absolutely in the public data dump. Questions are posts with PostTypeId="1" and answers are posts with PostTypeId="2". You (François) are user 2000, so I can extract the number of questions and answers you've posted with the commands
grep 'OwnerUserId="2000"' posts.xml | grep -c 'PostTypeId="1"'
grep 'OwnerUserId="2000"' posts.xml | grep -c 'PostTypeId="2"'
User perception survey. (That would help answer some open questions from the competition thread. Would take some time and expertise to design.)
Yes, I think this would take a lot of expertise to design and conduct, and I don't think it's worth it. I'm happy to go with the (possibly extremely skewed) feeling I can gather from that thread and other discussions here on meta.
Thanks for the tips Anton!
By the way, there is a meta.stackexchange request for better analytic tools (for moderators).
(«Putting a frog in the microwave is not an experiment, it's just animal cruelty.» that's going into my collection of quotes...)
First, I think the name (i.e. real life reputation) makes more of a difference than points, and the experiment doesn't separate these factors
Indeed. I think no one looks at Terry Tao's reputation, and if Richard Stanley's reputation were set to 1 each time he comes on the site I'd read his posts with the same attention...
Maybe we should change the font on the user names? :P
After Anton's tips, I've been looking at the user retention data. I've plotted the data in multiple ways looking for a pattern to test, the most revealing has been graphing the use ratio (last_use_date - creation_date)/(collection_date - creation_date) against the age (collection_date - creation_date) of each user. This graph shows dominant clustering around ratios 0 and 1, with scattered points in between (i.e. most users quit immediately or use regularly). The idea is to test the data against some model behavior, but nothing of the sort pops to mind. (This is not at all my area, so my knowledge base is limited.) Any thoughts on a good model for this?
PS: How do you post images on meta?
PS: How do you post images on meta?
Use Markdown formatting and write something like

See the syntax documentation for more incantations.
1 to 19 of 19