tea.mathoverflow.net - Discussion Feed (MO public database dumps) Sun, 04 Nov 2018 12:58:47 -0800 http://mathoverflow.tqft.net/ Lussumo Vanilla 1.1.9 & Feed Publisher darijgrinberg comments on "MO public database dumps" (22074) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=22074#Comment_22074 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=22074#Comment_22074 Thu, 09 May 2013 18:17:41 -0700 darijgrinberg About half a year has passed since the last one...

]]>
Anton Geraschenko comments on "MO public database dumps" (19672) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=19672#Comment_19672 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=19672#Comment_19672 Mon, 13 Aug 2012 16:36:29 -0700 Anton Geraschenko @darijgrinberg: I will definitely generate a new public dump before the 2.0 update, and will maintain redundant copies of the full dump.

]]>
darijgrinberg comments on "MO public database dumps" (19650) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=19650#Comment_19650 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=19650#Comment_19650 Mon, 13 Aug 2012 01:30:56 -0700 darijgrinberg How about a dump before the 2.0 update?

]]>
darijgrinberg comments on "MO public database dumps" (18889) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=18889#Comment_18889 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=18889#Comment_18889 Thu, 12 Apr 2012 14:40:13 -0700 darijgrinberg Thanks for making it work again!

]]>
Anton Geraschenko comments on "MO public database dumps" (18885) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=18885#Comment_18885 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=18885#Comment_18885 Thu, 12 Apr 2012 08:31:28 -0700 Anton Geraschenko Permissions issue resolved. The most recent dumps are now available.

]]>
Anton Geraschenko comments on "MO public database dumps" (18879) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=18879#Comment_18879 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=18879#Comment_18879 Mon, 09 Apr 2012 11:43:39 -0700 Anton Geraschenko Sorry, my fault. Unfortunately, I have to do some things manually each time. For a while, it became overwhelming because the database export feature broke (it's working again now that MO is on a new server), so I had to contact SE by email to get the dumps. I still got the snapshots, but didn't clean them up for public consumption. I'll get the last few dumps up within the hour. [ok, maybe a bit more than an hour; my up-bandwidth isn't so great and the dumps are >100MB each]

Edit: hmmm ... I'm running into some permissions issues. I'm checking with Scott to see if he changed something.

]]>
darijgrinberg comments on "MO public database dumps" (18876) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=18876#Comment_18876 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=18876#Comment_18876 Sun, 08 Apr 2012 15:59:40 -0700 darijgrinberg So what happened to the dumping? I thought there was some script doing that automatically, or does Anton have to tweak things every time?

]]>
Harald Hanche-Olsen comments on "MO public database dumps" (18545) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=18545#Comment_18545 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=18545#Comment_18545 Tue, 21 Feb 2012 07:42:29 -0800 Harald Hanche-Olsen @Anton: Too busy to make notes of new dumps here? These alerts are much appreciated, if only for avoiding this thread getting buried too deep. It's the place to go when looking for fresh dumps, after all. Oh, and the latest dump is dated 2012-01-03. Is the next one overdue?

]]>
Anton Geraschenko comments on "MO public database dumps" (15349) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=15349#Comment_15349 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=15349#Comment_15349 Mon, 01 Aug 2011 23:06:05 -0700 Anton Geraschenko There's a new dump posted.

]]>
Anton Geraschenko comments on "MO public database dumps" (14889) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=14889#Comment_14889 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=14889#Comment_14889 Thu, 07 Jul 2011 11:34:33 -0700 Anton Geraschenko There's a new dump posted.

]]>
Anton Geraschenko comments on "MO public database dumps" (14675) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=14675#Comment_14675 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=14675#Comment_14675 Sat, 04 Jun 2011 11:31:31 -0700 Anton Geraschenko There's a new dump posted.

]]>
Anton Geraschenko comments on "MO public database dumps" (14437) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=14437#Comment_14437 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=14437#Comment_14437 Tue, 10 May 2011 01:57:44 -0700 Anton Geraschenko Sorry for not posting a dump last month, and for being late this month. My excuse is that I've been writing my dissertation.

There's a new dump posted.

]]>
Anton Geraschenko comments on "MO public database dumps" (13631) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=13631#Comment_13631 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=13631#Comment_13631 Sun, 13 Mar 2011 12:58:27 -0700 Anton Geraschenko There's a new dump posted.

]]>
Anton Geraschenko comments on "MO public database dumps" (13079) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=13079#Comment_13079 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=13079#Comment_13079 Fri, 04 Feb 2011 08:42:45 -0800 Anton Geraschenko There's a new dump posted.

]]>
Anton Geraschenko comments on "MO public database dumps" (12308) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=12308#Comment_12308 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=12308#Comment_12308 Sun, 02 Jan 2011 00:07:03 -0800 Anton Geraschenko There's a new dump for the new year.

There are a couple of changes (mostly discussed in the few posts above this one). The biggest one is that post histories are now included! See the readme for details about what all the fields mean.

]]>
Anton Geraschenko comments on "MO public database dumps" (11806) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=11806#Comment_11806 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=11806#Comment_11806 Mon, 13 Dec 2010 01:04:59 -0800 Anton Geraschenko Okay, I've updated the public dump script to include the RevisionGUID for each revision (that's right, the dumps will contain revision histories starting 2011) and GravatarHash for each user.

LastEditorDisplayName isn't in the non-public table unless the user has been deleted (and so LastEditorUserId doesn't exist). Looking the the SO public dumps (at least the Stack Apps dump, since that's the smallest one to deal with), it looks like LastEditorDisplayName is always given, but is always empty!

]]>
Scott Morrison comments on "MO public database dumps" (11799) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=11799#Comment_11799 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=11799#Comment_11799 Sun, 12 Dec 2010 17:56:39 -0800 Scott Morrison Re: my post above on Dec 4th.

I see you've removed the reference to LastEditorDisplayName in the README.txt. Can't we instead include all of LastActivityDate, LastActivityUserId and LastActivityDisplayName, in the public dumps? Surely this is public data too!

]]>
Scott Morrison comments on "MO public database dumps" (11798) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=11798#Comment_11798 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=11798#Comment_11798 Sun, 12 Dec 2010 17:53:50 -0800 Scott Morrison Two requests for the public dumps:

  1. Include the GUID for each revision, in posthistory.xml. This is public information, contained in the URL for "view source" on the list of revisions.
  2. Include the Gravatar hash: take the user's email address or last login IP address, as a string, and compute the MD5 hash.
]]>
Scott Morrison comments on "MO public database dumps" (11257) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=11257#Comment_11257 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=11257#Comment_11257 Sat, 04 Dec 2010 12:24:58 -0800 Scott Morrison The README.txt for the dumps says that posts.xml should contain a field "LastEditorDisplayName". It doesn't seem to.

]]>
Anton Geraschenko comments on "MO public database dumps" (11178) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=11178#Comment_11178 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=11178#Comment_11178 Wed, 01 Dec 2010 15:03:38 -0800 Anton Geraschenko I've just posted the December dump.

]]>
Anton Geraschenko comments on "MO public database dumps" (10153) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=10153#Comment_10153 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=10153#Comment_10153 Wed, 03 Nov 2010 16:45:07 -0700 Anton Geraschenko I've just posted the November dump.

]]>
Anton Geraschenko comments on "MO public database dumps" (9593) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=9593#Comment_9593 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=9593#Comment_9593 Thu, 14 Oct 2010 22:29:55 -0700 Anton Geraschenko No reason to censor; it just takes a bit of work to include. I've been meaning to do it for a while, but I haven't gotten around to it yet. When I originally wrote the code to generate public dumps, I modeled it after the SO public dumps, which don't include post histories for some reason.

There are two tables in the database. One contains the posts in their current state in html (for fast serving). The other contains an entry for every edit, retag, or other action that can be taken on a post; notably, it contains the markdown source. It shouldn't be too bad to include another file, posthistories.xml, in the public dump. It would make the dump perhaps 50% larger.

]]>
Scott Morrison comments on "MO public database dumps" (9592) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=9592#Comment_9592 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=9592#Comment_9592 Thu, 14 Oct 2010 20:40:52 -0700 Scott Morrison Anton didn't announce it, but he put up a dump for October, too.

A question about the database dumps -- they don't include the edit history of posts. I assume, @Anton, that you have these in the full dump? Was there some reason to censor these?

]]>
Bill Dubuque comments on "MO public database dumps" (7985) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=7985#Comment_7985 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=7985#Comment_7985 Tue, 03 Aug 2010 19:10:45 -0700 Bill Dubuque Thanks much Anton.

]]>
Anton Geraschenko comments on "MO public database dumps" (7984) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=7984#Comment_7984 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=7984#Comment_7984 Tue, 03 Aug 2010 19:04:57 -0700 Anton Geraschenko I've just posted a fresh database dump. By popular demand (somebody asked for it), I've included the hour each vote was cast. Before it only included the day. This should be enough to do some rough analysis of voting habits. If you want aggregate statistics finer than that, you'll have to ask me to look at the full dump.

]]>
Kevin Buzzard comments on "MO public database dumps" (6732) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=6732#Comment_6732 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=6732#Comment_6732 Tue, 06 Jul 2010 23:44:08 -0700 Kevin Buzzard
EDIT: I have solved this myself. The trick I found is to go to http://mathoverflow.net/questions/5703 and you get redirected to the right place. ]]>
Andrey Rekalo comments on "MO public database dumps" (6681) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=6681#Comment_6681 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=6681#Comment_6681 Sat, 03 Jul 2010 13:52:03 -0700 Andrey Rekalo Anton Geraschenko comments on "MO public database dumps" (6632) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=6632#Comment_6632 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=6632#Comment_6632 Thu, 01 Jul 2010 16:54:59 -0700 Anton Geraschenko Early 4th of July present: a fresh database dump.

]]>
Scott Morrison comments on "MO public database dumps" (5697) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=5697#Comment_5697 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=5697#Comment_5697 Tue, 01 Jun 2010 08:55:39 -0700 Scott Morrison Oh dear, is it June already? :-)

]]>
Anton Geraschenko comments on "MO public database dumps" (5695) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=5695#Comment_5695 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=5695#Comment_5695 Tue, 01 Jun 2010 08:32:32 -0700 Anton Geraschenko Happy June. I just posted a fresh database dump.

]]>
Scott Morrison comments on "MO public database dumps" (5231) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=5231#Comment_5231 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=5231#Comment_5231 Thu, 29 Apr 2010 15:37:02 -0700 Scott Morrison Fixed. Copy and paste error, I presume.

]]>
Harald Hanche-Olsen comments on "MO public database dumps" (5228) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=5228#Comment_5228 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=5228#Comment_5228 Thu, 29 Apr 2010 15:26:58 -0700 Harald Hanche-Olsen Why does the zip file have the same filename as the previous dump?

]]>
Anton Geraschenko comments on "MO public database dumps" (5216) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=5216#Comment_5216 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=5216#Comment_5216 Thu, 29 Apr 2010 12:18:24 -0700 Anton Geraschenko I just posted a fresh database dump.

]]>
Scott Morrison comments on "MO public database dumps" (4441) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=4441#Comment_4441 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=4441#Comment_4441 Mon, 05 Apr 2010 10:32:29 -0700 Scott Morrison Okay, I agree that the benefits outweigh the slight, arguable, violation of the "only public data" rule.

]]>
Anton Geraschenko comments on "MO public database dumps" (4326) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=4326#Comment_4326 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=4326#Comment_4326 Fri, 02 Apr 2010 14:11:08 -0700 Anton Geraschenko Yes, it does mean that there is some information in the dump which is not available through the website: votes to close have user ids even if they are not "effective". Non-effective votes to close occur in two ways: either not enough votes to close have accumulated on a question (e.g. there are four such votes right now), or the vote to close has expired (votes to close only have a lifespan of four days).

When you vote to close (or reopen), you're volunteering to associate your name to that vote (after all, your name is displayed once the question is actually closed/reopened), so I don't feel like anybody can reasonably object that they meant to keep their identity private when voting to close. The main reason I think it's good to not display who has voted to close a question which is not yet closed is that people vote differently when they know who is "on their side". Not displaying who has voted to close is a way of getting people to take personal responsibility for their vote to close. The dump contains so little information that undermines this purpose that I don't think it's a problem. I'm happy to change it back if somebody has a good reason to do so.

]]>
Scott Morrison comments on "MO public database dumps" (4325) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=4325#Comment_4325 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=4325#Comment_4325 Fri, 02 Apr 2010 11:13:24 -0700 Scott Morrison Just a sec: doesn't this make "votes to close" publicly identifiable before the question actually gets closed?

]]>
Anton Geraschenko comments on "MO public database dumps" (4321) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=4321#Comment_4321 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=4321#Comment_4321 Thu, 01 Apr 2010 22:09:54 -0700 Anton Geraschenko The April database dump is now available, fresh off the server. I've corrected/changed a couple of things in the script I use to produce the public dump. Specifically, vote tallies are now included in the users.xml file (in the last dump, I claimed in the readme that they were included, but they weren't actually). Also the votes.xml file now includes the UserId when somebody votes to close or reopen a question.

]]>
Anton Geraschenko comments on "MO public database dumps" (3654) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3654#Comment_3654 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3654#Comment_3654 Thu, 04 Mar 2010 16:32:01 -0800 Anton Geraschenko You can now access the dump (or any part of it) at dumps.mathoverflow.net.

]]>
Harry Gindi comments on "MO public database dumps" (3643) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3643#Comment_3643 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3643#Comment_3643 Thu, 04 Mar 2010 00:11:21 -0800 Harry Gindi That was Anton, before I suggested putting it on a free hosting site.

]]>
Scott Morrison comments on "MO public database dumps" (3642) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3642#Comment_3642 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3642#Comment_3642 Thu, 04 Mar 2010 00:05:52 -0800 Scott Morrison Also, someone has already put up a torrent, for example at http://thepiratebay.org/torrent/5408662/.

]]>
Scott Morrison comments on "MO public database dumps" (3641) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3641#Comment_3641 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3641#Comment_3641 Wed, 03 Mar 2010 23:51:35 -0800 Scott Morrison It's now available at http://dumps.tqft.net/.

Soon after Anton reads this, you can access dumps at http://dumps.mathoverflow.net/. (Anton, do the usual! I'll give you shell access as well.)

]]>
Anton Geraschenko comments on "MO public database dumps" (3638) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3638#Comment_3638 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3638#Comment_3638 Wed, 03 Mar 2010 23:20:04 -0800 Anton Geraschenko You're right. I'll just do that. Edit: done.

]]>
Harry Gindi comments on "MO public database dumps" (3636) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3636#Comment_3636 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3636#Comment_3636 Wed, 03 Mar 2010 23:10:44 -0800 Harry Gindi =(. The database dump is 9.6 MB. Why don't you rapidshare/megaupload/etc it?

]]>
Anton Geraschenko comments on "MO public database dumps" (3634) http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3634#Comment_3634 http://mathoverflow.tqft.net/discussion/266/mo-public-database-dumps/?Focus=3634#Comment_3634 Wed, 03 Mar 2010 23:02:55 -0800 Anton Geraschenko Here's the first public dump of the database. It's from earlier today.

http://dumps.mathoverflow.net/, or
http://ifile.it/soyqa09/MOdump20100303.zip

]]>