Feature requests: Trackbacks to arXiv?

Bottom of Page

1 to 25 of 25

- CommentAuthorJosé Figueroa
- CommentTimeMar 10th 2010
I was wondering whether it would be possible to implement some sort of trackback to arXiv every time that an eprint gets mentioned (by which i mean linked) in MO. MO discussions are generally quite useful and I think that it would add valuable information to an eprint if one could see whether it has been discussed somewhere in MO.
- CommentAuthorScott Morrison
- CommentTimeMar 10th 2010
This is an excellent idea, but it would require cooperation from multiple people. First, we'd need StackExchange to use the trackback protocol, and this would require a feature request at meta.stackexchange.com (please link here if you make it!). Second, the trackbacks on the arxiv end are run on "whitelist", not "blacklist" basis, and we would have to talk to the appropriate people there. I think I know who this is, and I think it is likely they would be agreeable.
- CommentAuthorTom Leinster
- CommentTimeMar 11th 2010
On Scott's second point, it might be worth mentioning it to Greg Kuperberg - he'd surely know who to talk to at the arXiv.
- CommentAuthorJosé Figueroa
- CommentTimeMar 11th 2010
I have now asked this at meta.stackexchange.com as Scott suggested: http://meta.stackexchange.com/questions/4896/trackbacks-to-arxiv-in-mathoverflow .
- CommentAuthorScott Morrison
- CommentTimeMar 11th 2010
Thanks figueroa! I hope we get somewhere on this. At some point I understood the trackback system, but not anymore. I think we should seriously think about the possibility of implementing this ourselves (even if it's a script run externally that provides the glue), as this could be a really great feature. I think having trackbacks would make questions about points raised in specific papers even more useful.
- CommentAuthorScott Morrison
- CommentTimeMar 11th 2010
This looks like it might be really easy. Sending a trackback (i.e. notifying the arxiv of a link) is just a matter of making a http POST request to a certain URL, filling out certain fields. See the spec. The arxiv has a semi-automated trackback spam filter in place, but I'll go now and ask what we'd need to do to make sure we can get through this.

I think the technically easiest solution is to run a cron job that watches for links to the arxiv from mathoverflow, then constructs the appropriate request to the arxiv. If someone wants to get thinking about a little command-line program that finds recent links from mathoverflow to the arxiv, please go ahead and report back here. For now, just something that outputs pairs (mathoverflow URL, arxiv URL) would be great.
- CommentAuthorScott Morrison
- CommentTimeMar 11th 2010
I have a little program (written in mathematica, because my perl/bash/python-fu is weak) that can either take an RSS feed from MathOverflow, or the database dump, and generate trackbacks. I only ran the final step of pinging the arxiv on one page, and got back a status 200 OK, but didn't see a link appear. Let's wait until we see if the arxiv is agreeable, and then someone can write a proper script.

(Interestingly, in the database dump I already find ~650 links to the arxiv.)
- CommentAuthorRyan Budney
- CommentTimeMar 12th 2010
This sounds like an excellent idea. I've always hoped there would be a convienient way to discuss the finer points of papers on-line with people who are also keen to understand the same papers. This might be an excellent mechanism to promote that.
- CommentAuthorScott Morrison
- CommentTimeMar 12th 2010 edited
Anton and I made a lot of behind the scenes progress today. We've talked to the arxiv administrators, and you can already see a sample trackback at http://arxiv.org/tb/math/0307263. Quite likely, we'll submit all the historical trackbacks for existing questions soon, and then process new trackbacks as the links are made.
- CommentAuthorJosé Figueroa
- CommentTimeMar 14th 2010
Excellent! And impressively fast too! Thanks to Scott and Anton for their efforts!!!
- CommentAuthorScott Morrison
- CommentTimeMar 20th 2010
After a few delays, and some back and forth with the arxiv folks to resolve some issues, we now have trackbacks working!

We've submitted all the "historical" trackbacks from earlier posts, and these have been incorporated into the appropriate pages. There were ~1000 of these. We have a cron job running on the same server that hosts meta, which periodically looks for new arxiv links in recently edited questions, and submits those. (That is, we're doing this entirely without StackExchange support.)

We locate references to arxiv papers very aggressively -- it's not even necessary to actually have a link. We've implemented a bunch of tricks to locate arxiv identifiers, and checked the output carefully for the historical links, but won't be reviewing things routinely in future. If you happen to see an incorrect trackback, please let us know, as we'd like to play nice for the arxiv folks (who've been extremely helpful and accommodating!). If you suspect that something should have generated a trackback, make sure it wasn't the result of a recent edit (there's at most a 24 hour delay before you'll see things on the arxiv, it seems) and then let us know here or elsewhere.

Even though we try to find other arxiv references, I'd strongly encourage everyone to use proper links to the arxiv, with [link text](link url) style links. This makes it easier for readers than just bare arxiv identifiers, and means that the links will be detected by the less error prone parts of our program!

Go out there and discuss some arxiv papers! You can also use http://arxiv.org/tb/recent to see recent trackbacks on the arxiv -- I expect that mathoverflow will dominate this listing for now.
- CommentAuthorHarry Gindi
- CommentTimeMar 20th 2010
This is a great reason to make markdown-style links work in comments...
- CommentAuthorAnton Geraschenko
- CommentTimeMar 20th 2010
arXiv links in comments are picked up just fine. Even if you refer to an article but don't link to the arXiv, the reference is picked up. You can write stuff like "arxiv:math/0504123" or "arxiv: 1001.1234v3". The main advantage of including the actual URL in a comment (or a link in a post) is that it's easier for people to get to the article ... they don't have to type "arxiv.org/abs/XYZW".
- CommentAuthorHarry Gindi
- CommentTimeMar 20th 2010
Sure, I was responding to Scott's call for more links of the form [link text](link)
- CommentAuthorJosé Figueroa
- CommentTimeMar 20th 2010
Scott> I expect that mathoverflow will dominate this listing for now.

I agree. 28 of the 40 most recent trackbacks point to MO. This ought to increase MOs exposure and potentially increase participation. Is there a way to measure this? i.e., a way to see whether someone reached MO via a link from the arXiv?

I have one question: will something like '1001.1234' be picked up without the 'arXiv:' prefix?
- CommentAuthorScott Morrison
- CommentTimeMar 20th 2010
@figueroa, yes, 1001.1234 will be picked up without the arXiv: prefix, but subject to some other complicated constraints on the adjacent characters (e.g., there has to something other than an alphanumeric character other than v immediately afterwards...).

And yes, at least for users with javascript enabled (almost everyone), we'll see in Anton's google analytics statistics how many people arrive via arxiv trackbacks. Remind us again in a month if we haven't reported on this --- I'm really interested to find out how effective this will be.
- CommentAuthorAnton Geraschenko
- CommentTimeMar 20th 2010
I have one question: will something like '1001.1234' be picked up without the 'arXiv:' prefix?

Yes. Anything matching the following regex will be picked up: [^\w.]\d\d[01]\d.\d{4}(?:[v\W]|$) . Based on five months of data, this doesn't seem to have picked up any false positives. If we do get false positives, we'll refine our search. Feel free to use this thread to imagine situations where somebody might reasonably match this regex without meaning to refer to an arXiv article.
- CommentAuthorAndrew Stacey
- CommentTimeMar 20th 2010
You miss out on old-style identifiers, and you miss out on references that start a line.
- CommentAuthorScott Morrison
- CommentTimeMar 20th 2010
@Andrew -- we also catch old-style identifiers via different regexes. Anton was just showing how we catch new ones. You're right about starting lines, and our testing wouldn't have picked up those. Note that we handle "fake new-style identifiers", introduced on the Front, by querying the Front for the correct old-style identifier.
- CommentAuthorBen Webster
- CommentTimeMar 20th 2010 edited
@figueroa- 28 out of 40 actually understates their impact. I count 2 mathematics trackbacks out of the last 100 that aren't from MO (one from Terry Tao, one from NCC); all the other ones are for physics. I think we're going to dominate math trackbacks for a very long time.
- CommentAuthorAnton Geraschenko
- CommentTimeMar 21st 2010 edited
@Andrew: Actually, nothing in a post ever starts a new line or ends a line since posted content is always within <p> tags. The reason for including the end-of-line character at the end is really just for debugging purposes.
- CommentAuthorAnton Geraschenko
- CommentTimeMar 21st 2010
"I count 2 mathematics trackbacks out of the last 100 that aren't from MO." That's a little unfair since we submitted over 1000 trackbacks all at once from scraping the database. Assuming people keep posting arXiv links at about the same rate they've been doing, we should see between 5 and 10 new trackbacks per day.
- CommentAuthorjc
- CommentTimeApr 28th 2010
I posted arxiv links in comments a few times and they don't seem to have been picked up by the scraper so far:

http://mathoverflow.net/questions/21014/isotropic-deformation-retract-of-weinstein-manifolds http://mathoverflow.net/questions/21236/weight-filtration-for-smooth-analytic-manifolds
- CommentAuthorAnton Geraschenko
- CommentTimeApr 28th 2010 edited
@jc: the scraper can sometimes miss comments because they don't bump the question to the top of the home page and because you sometimes have to fetch additional comments (which the scraper isn't smart enough to do). We also have a database scraper which catches everything. I'll run that and send a list of missed trackbacks to the arXiv admins. The idea was that we would do one of these database scrapes once per month to catch anything the scraper misses, but perhaps we should aim to do it a bit more frequently.

Edit: I've emailed the new trackbacks to the arXiv admin. Hopefully they'll be up soon.
More Edit: They're up. That was quick.
- CommentAuthorScott Morrison
- CommentTimeApr 28th 2010
+1 all round.

1 to 25 of 25

Back to Discussions Top of Page

tea.mathoverflow.net

Discussion Feed

Feature requests: Trackbacks to arXiv?