Migration: Implement the StackExchange API?

Bottom of Page

1 to 15 of 15

- CommentAuthorScott Morrison
- CommentTimeJul 22nd 2011 edited
Whether we migrate to SE 2.0 or not, I think we should make an effort to write some software, on a much more modest scale than alpha.mathoverflow.net.

In particular, we should implement the StackExchange API, running off a database dump.

What does this give us?
- An easy way to quickly run a read-only version of MO, if something goes badly wrong in our relationship with SE (e.g. someone evil buys them; I'm not really worried about this being necessary with Joel Spolsky at the helm).
- We can leverage the existing community developed software, collected at Stack Apps. In particular, this includes quite nice read-only frontends, that go entirely through the API, and are independent of any backend design choices about how the data is stored.
- If we ever want to write our own independent interface, it gives us a natural way to partition the task into smaller projects, which can be independently developed, by different participants, in different languages, etc.
- It gives us a great helping hand overcoming the handicap that mathematicians are bad software developers; the API has quite well defined behaviour, and I think on Stack Apps one can find existing test suites that check the SE implementation of their own API. We would thus be writing to a spec, with an existing body of tests.
In fact, writing this I realize what we really should be doing! Hire a real programmer to write something that:
- can import a SE 1.0 dump (full or public, but if public with some limitations below)
- implements the current API (perhaps with some exceptions; there's some stuff that doesn't make sense for a 1.0 dump)
- satisfies some basic performance requirements (my experience writing alpha.mathoverflow.net means I know some of the worst bottlenecks, and I think I could quickly write down what's necessary to avoid them)
- passes all the tests in a conformance suite (which, as above, I hope already exists!) with exceptions when parts of the API don't make sense for our 1.0 database
Discuss, work out how much this should cost, find appropriate grant money (I'm pretty certain I can arrange this), and hire someone.

I'll be offline next week, but if someone wants to look at the API and Stack Apps and see if what I've said above is sensible, that would be great! (I'm also time constrained and on a phone: if someone wants to provide URLs for these things, please do.)
- CommentAuthorgrp
- CommentTimeJul 22nd 2011
I'll take a crack at it. Just so we are on the same page, send me at least one link so that we agree on what version of stuff to consider, or allow me to bug the moderator email address so I can send you the links of what I've found, and you can tell me if I am on the right track.

Also, I still think the idea of a new category on meta under which to file these discussions is still a good one. If you have a better alternative, let me know.

Gerhard "Ask Me About System Design" Paseman, 2011.07.22
- CommentAuthorScott Morrison
- CommentTimeJul 22nd 2011
I made a new category, "Migration", and retagged some old discussions. @grp: Feel free to email the moderator address, or me directly.
- CommentAuthorScott Morrison
- CommentTimeJul 22nd 2011
- Announcement about API: http://blog.stackoverflow.com/2010/05/stack-exchange-api-public-beta-starts/
- Index to API documentation: http://stackapps.com/questions/1/api-documentation-and-help
- Existing software using the API in some way: http://stackapps.com/
- A test suite of the API: http://code.google.com/p/theworldsworststackoverflowclone/
- CommentAuthorScott Morrison
- CommentTimeJul 22nd 2011
Further, regarding testing the API, lots of people have written wrappers for the API in their favourite language: java, python, and many others. Many of these in turn have unit tests (I'm not sure which ones). These were intended to test the wrappers, but now we know the wrappers work, they perhaps easily become tests that an implementation of the API works.
- CommentAuthorAlexander Woo
- CommentTimeJul 22nd 2011
1) How stable is this API? If it is not completely stable, what do we do when it changes?

2) If we need to not implement parts of the API, how will we make the necessary decisions about what not to implement?

3) Quick guess as to cost on a 10 minute glance at the API - in the neighborhood of $200K, for 3 person-months of design (mostly database design), 4 person-months of implementation, and 2 person-months of QA. This already includes the work of a project manager - and yes we do need professional project management. Some could be shaved off that if we are willing to actually do the testing ourselves. I could be off, but it's not going to be less than $100K unless we're hiring amateurs.
- CommentAuthorScott Morrison
- CommentTimeJul 23rd 2011 edited
Hi Alex,

I dimly remember you having some experience with commercial software development, so I appreciate you taking a look at this and suggesting some estimates. Nevertheless, I think they are off.

For starters, there's no database design work to do at all, let alone 3 months worth. The database schema is directly available, e.g. as described over at the Stack Exchange Data Explorer. (In fact, you can actually run arbitrary SQL queries against Stack Exchange site database dumps there.) Moreover, I've already written something that loads XML database dumps into SQL (while working on alpha), although I don't use exactly the same column names in places. I've also already got some of the SQL queries that are required for the API, including the horrific ones like "give me all comments, answers, comments on answers, along with user information for all of these, for this post". Obviously these queries still need lots of testing, as well as code that serializes the results to JSON, so there is indeed lots more work. All that I'm saying is that the database design part is all old hat, and we have good evidence that the schema actually supports the most difficulty queries we need to run.
- CommentAuthorgrp
- CommentTimeJul 23rd 2011
Normally, I would concur with Scott, and mention my own estimates (which would assume less time than Alex has). Instead, I am asking a few people I trust to provide rough estimates themselves; after that I will report back.

From my brief overview, I think we can build a partial prototype in a couple of days to see if we are on the right track, and then spend a week on fleshing out the prototype to see how much progress can be made building and testing. After that a realistic estimate for the whole project should be within reach.

Gerhard "Ask Me About System Design" Paseman, 2011.07.23
- CommentAuthorAlexander Woo
- CommentTimeJul 24th 2011
1) I agree that having a database schema already available cuts out most of the design work and some of the implementation work.

2) I'm not sure what we want here. Do we want something that is robust, adequately documented, and maintainable? Or something that will mostly work for a few months without crashing the web servers too often? The former (which is what I was assuming) is usually at least 3 and occasionally 10 times as expensive as the latter. I mean this as a live question. If we want throwaway code which will keep a read-only site afloat for a few months to a year in the unlikely event we need to roll our own, then there is no need to spend the money. If on the other hand we expect to be rolling our own sooner or later and see this as a down payment, then the work needs to be done with an eye to fitting in a larger project - and in particular needs to be modifyable in the event of an API change.
- CommentAuthorgrp
- CommentTimeJul 26th 2011
I imagine a User Interface which has a Classic mode and a New-and-Shiny mode. I also imagine add-ons such as chat, online blackboards, and other fripperies. I also imagine basic features such as comprehensive indexing and notification of new/edited comments (as opposed to notification of comments addressed to one individual or the other). Without a database to run on and an API to dish it out, all the above are meaningless.

While I would like to see more of Scott's intentions (a.k.a what to we make after an API?), I agree that having our own API to read and write a MathOverflow database would be handy. Not only could special applications be developed to appeal to small or large groups of users, but it would be insurance against a catastrophe involving StackExchange (if developed soon enough).

Of course something robust and adequately tested is preferred. However, even something that will mostly work until a better solution is found may not only be better than nothing, it may be better than trying to revert from 2.0 back to 1.0 if the need should arise.

Gerhard "Ask Me About System Design" Paseman, 2011.07.26
- CommentAuthorgrp
- CommentTimeJul 27th 2011
I am still assessing the time to build the API. Although I am willing to hash out the details here, it might make sense to take the technical discussion to a wiki or to whatever forum is set up for discussing alpha.

I see implementing the API in python with a MySQL server running on the same box. Importing some data and building the API seem quite doable to me; with a little assistance I can probably produce something testable within a couple weeks. Hashing out the design and making it robust under heavy loads will call for some more expertise, but I have some people to consult on that. I will do some more research and come up with a proposed schedule in a few days.

Focussing on the API is important, but it would help me if you (Scott) could give me your opinion as to what the next step would be after that. It might also be good to call out to more of the community to get a few more tech-savvy opinions on making the API and the next step, whatever that may be.

Gerhard Paseman, 2011.07.27
- CommentAuthorAlexander Woo
- CommentTimeJul 29th 2011
My general objection to mathematicians writing software is not that it turns out inefficient or prone to failure. If anything, mathematician written software tends to work more correctly and possibly more efficiently than usual for everyone else. What mathematicians don't do well is make it easy or even possible for other people to read, understand, and modify the code they have written.

In this case, there aren't any serious design issues - given the API, we can't really end up implementing the necessary computations in some way that makes the code impossible to maintain. However, unless throwaway code is the intent, I would like to make sure enough resources are budgeted to *document* what is done.

(Gerhard - I'm sure you know that writing proper documentation takes almost as much time as writing the code itself, but I'm not sure everyone else does.)
- CommentAuthorgrp
- CommentTimeJul 29th 2011
Indeed Alex, I would think that documenting the behaviour before writing the code to produce the behaviour is essential here. A reason not to do docs before code would be if the code has a limited and short lifetime.

I am incrementally building an informal spec sheet which talks about how the API is to be used as well as how it is to be implemented. I hope that more voices chime in on this thread; that would mean there is some more interest (and perhaps expertise) in turning the spec sheet and discussion into something useful for the community.

Gerhard "Ask Me About System Design" Paseman, 2011.07.29
- CommentAuthorgrp
- CommentTimeAug 2nd 2011
I initially came up with an estimate of about 40 hours of coding and unit testing, with as many hours of alpha testing as could be gleaned from the community, to build the API and a primitive, minimally functional front end to be used for testing. I got another estimate that suggested using certain tools and frontend development platform which would allow a design-with-test methodology and would clone the API, provide a fully functional front end for users, provide a functional administrative interface, use existing designs with options for scalability, and need about 60 hours of coding and testing, with no estimate on how much alpha testing would be needed. I think we are at the stage where not only we can build the API quickly, but that we should be sure we know what we want to do with it.

I am willing to donate at least 10 hours of coding and some more hours of design in order to get this project up and running. If there is interest in the community to assist with this, in testing or design or even participating in the overall discussion, I suggest speaking up now, if only to indicate such interest. MathOverflow the forum will have to migrate eventually to survive, and the community (or some subset) will have to decide how dependent the forum will be on resources and services provided by entities with other interests; Scott Morrison is suggesting (and I agree) that the cost of independence in time and money is not only within reach, but will offer rewards such as the ability to make software and interface changes that are oft mentioned on meta threads.

Scott, Anton, and the other moderators: is it appropriate to put up (for a few days) a banner on MathOverflow directing users to this thread and to the migrate to SE 2.0 thread?

Gerhard Paseman, 2011.08.02
- CommentAuthorAlexander Woo
- CommentTimeAug 2nd 2011
In mathematician lingo, I am willing to "referee" design documents (up to a limit of roughly 20-30 pages) and, if desired, review bits of code.

1 to 15 of 15

Back to Discussions Top of Page

tea.mathoverflow.net

Discussion Feed

Migration: Implement the StackExchange API?