Not signed in (Sign In)

Vanilla 1.1.9 is a product of Lussumo. More Information: Documentation, Community Support.

  1.  

    Here's the first public dump of the database. It's from earlier today.

    http://dumps.mathoverflow.net/, or
    http://ifile.it/soyqa09/MOdump20100303.zip

  2.  

    =(. The database dump is 9.6 MB. Why don't you rapidshare/megaupload/etc it?

  3.  

    You're right. I'll just do that. Edit: done.

  4.  

    It's now available at http://dumps.tqft.net/.

    Soon after Anton reads this, you can access dumps at http://dumps.mathoverflow.net/. (Anton, do the usual! I'll give you shell access as well.)

  5.  

    Also, someone has already put up a torrent, for example at http://thepiratebay.org/torrent/5408662/.

  6.  

    That was Anton, before I suggested putting it on a free hosting site.

  7.  

    You can now access the dump (or any part of it) at dumps.mathoverflow.net.

  8.  

    The April database dump is now available, fresh off the server. I've corrected/changed a couple of things in the script I use to produce the public dump. Specifically, vote tallies are now included in the users.xml file (in the last dump, I claimed in the readme that they were included, but they weren't actually). Also the votes.xml file now includes the UserId when somebody votes to close or reopen a question.

  9.  

    Just a sec: doesn't this make "votes to close" publicly identifiable before the question actually gets closed?

  10.  

    Yes, it does mean that there is some information in the dump which is not available through the website: votes to close have user ids even if they are not "effective". Non-effective votes to close occur in two ways: either not enough votes to close have accumulated on a question (e.g. there are four such votes right now), or the vote to close has expired (votes to close only have a lifespan of four days).

    When you vote to close (or reopen), you're volunteering to associate your name to that vote (after all, your name is displayed once the question is actually closed/reopened), so I don't feel like anybody can reasonably object that they meant to keep their identity private when voting to close. The main reason I think it's good to not display who has voted to close a question which is not yet closed is that people vote differently when they know who is "on their side". Not displaying who has voted to close is a way of getting people to take personal responsibility for their vote to close. The dump contains so little information that undermines this purpose that I don't think it's a problem. I'm happy to change it back if somebody has a good reason to do so.

  11.  

    Okay, I agree that the benefits outweigh the slight, arguable, violation of the "only public data" rule.

  12.  

    I just posted a fresh database dump.

  13.  

    Why does the zip file have the same filename as the previous dump?

  14.  

    Fixed. Copy and paste error, I presume.

  15.  

    Happy June. I just posted a fresh database dump.

  16.  

    Oh dear, is it June already? :-)

  17.  

    Early 4th of July present: a fresh database dump.

  18.  
    Thank you. This is very useful.
  19.  
    Quick and hopefully easy question, I thought that a recent question on MO was a duplicate and I couldn't find it using MO search, so I turned to the dumps and I found what I was looking for using grep. I now have a comment from comments.xml [i.e. a line in that file] and I want to locate this within the MO site. The comments.xml line starts "row Id="6664" PostId="5703"". Should this be the information I need to locate the comment within the site? I also know who posted the comment and when they posted it, but this doesn't seem to be much help in actually locating the comment on the site.

    EDIT: I have solved this myself. The trick I found is to go to http://mathoverflow.net/questions/5703 and you get redirected to the right place.
  20.  

    I've just posted a fresh database dump. By popular demand (somebody asked for it), I've included the hour each vote was cast. Before it only included the day. This should be enough to do some rough analysis of voting habits. If you want aggregate statistics finer than that, you'll have to ask me to look at the full dump.

  21.  

    Thanks much Anton.

  22.  

    Anton didn't announce it, but he put up a dump for October, too.

    A question about the database dumps -- they don't include the edit history of posts. I assume, @Anton, that you have these in the full dump? Was there some reason to censor these?

  23.  

    No reason to censor; it just takes a bit of work to include. I've been meaning to do it for a while, but I haven't gotten around to it yet. When I originally wrote the code to generate public dumps, I modeled it after the SO public dumps, which don't include post histories for some reason.

    There are two tables in the database. One contains the posts in their current state in html (for fast serving). The other contains an entry for every edit, retag, or other action that can be taken on a post; notably, it contains the markdown source. It shouldn't be too bad to include another file, posthistories.xml, in the public dump. It would make the dump perhaps 50% larger.

  24.  

    I've just posted the November dump.

  25.  

    I've just posted the December dump.

  26.  

    The README.txt for the dumps says that posts.xml should contain a field "LastEditorDisplayName". It doesn't seem to.

  27.  

    Two requests for the public dumps:

    1. Include the GUID for each revision, in posthistory.xml. This is public information, contained in the URL for "view source" on the list of revisions.
    2. Include the Gravatar hash: take the user's email address or last login IP address, as a string, and compute the MD5 hash.
  28.  

    Re: my post above on Dec 4th.

    I see you've removed the reference to LastEditorDisplayName in the README.txt. Can't we instead include all of LastActivityDate, LastActivityUserId and LastActivityDisplayName, in the public dumps? Surely this is public data too!

  29.  

    Okay, I've updated the public dump script to include the RevisionGUID for each revision (that's right, the dumps will contain revision histories starting 2011) and GravatarHash for each user.

    LastEditorDisplayName isn't in the non-public table unless the user has been deleted (and so LastEditorUserId doesn't exist). Looking the the SO public dumps (at least the Stack Apps dump, since that's the smallest one to deal with), it looks like LastEditorDisplayName is always given, but is always empty!

  30.  

    There's a new dump for the new year.

    There are a couple of changes (mostly discussed in the few posts above this one). The biggest one is that post histories are now included! See the readme for details about what all the fields mean.

  31.  

    There's a new dump posted.

  32.  

    There's a new dump posted.

  33.  

    Sorry for not posting a dump last month, and for being late this month. My excuse is that I've been writing my dissertation.

    There's a new dump posted.

  34.  

    There's a new dump posted.

  35.  

    There's a new dump posted.

  36.  

    There's a new dump posted.

  37.  

    @Anton: Too busy to make notes of new dumps here? These alerts are much appreciated, if only for avoiding this thread getting buried too deep. It's the place to go when looking for fresh dumps, after all. Oh, and the latest dump is dated 2012-01-03. Is the next one overdue?

  38.  

    So what happened to the dumping? I thought there was some script doing that automatically, or does Anton have to tweak things every time?

  39.  

    Sorry, my fault. Unfortunately, I have to do some things manually each time. For a while, it became overwhelming because the database export feature broke (it's working again now that MO is on a new server), so I had to contact SE by email to get the dumps. I still got the snapshots, but didn't clean them up for public consumption. I'll get the last few dumps up within the hour. [ok, maybe a bit more than an hour; my up-bandwidth isn't so great and the dumps are >100MB each]

    Edit: hmmm ... I'm running into some permissions issues. I'm checking with Scott to see if he changed something.

  40.  

    Permissions issue resolved. The most recent dumps are now available.

  41.  

    Thanks for making it work again!

  42.  

    How about a dump before the 2.0 update?

  43.  

    @darijgrinberg: I will definitely generate a new public dump before the 2.0 update, and will maintain redundant copies of the full dump.

  44.  

    About half a year has passed since the last one...