As you can see from the last column, indexing isn't costing anything!†
!!!
]]>
But that only gives information about who visits the question after it's been asked. The real question is how the asker got to the site, and it seems like that would be non-trivial to extract even with the access logs. I guess we could find the first occurrence of the asker's IP.
†You should probably ignore that. In the mornings, I'm the only one who thinks I'm funny.
]]>::1 - - [21/Mar/2010:15:34:28 +0100] "POST /~astacey/wordpress/wp-admin/admin-ajax.php HTTP/1.1" 200 237 "http://localhost/~astacey/wordpress/wp-admin/post-new.php" "Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.9.1.8) Gecko/20100216 Fedora/3.5.8-1.fc11 Firefox/3.5.8"
::1 - - [21/Mar/2010:15:34:52 +0100] "GET /~astacey/wordpress/wp-admin/images/button-grad-active.png HTTP/1.1" 200 284 "http://localhost/~astacey/wordpress/wp-admin/css/colors-fresh.css?ver=20091217" "Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.9.1.8) Gecko/20100216 Fedora/3.5.8-1.fc11 Firefox/3.5.8"
::1 - - [21/Mar/2010:15:34:52 +0100] "POST /~astacey/wordpress/wp-admin/post.php HTTP/1.1" 302 - "http://localhost/~astacey/wordpress/wp-admin/post-new.php" "Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.9.1.8) Gecko/20100216 Fedora/3.5.8-1.fc11 Firefox/3.5.8"
::1 - - [21/Mar/2010:15:34:58 +0100] "POST /~astacey/wordpress/wp-cron.php?doing_wp_cron HTTP/1.0" 200 - "-" "WordPress/2.9.2; http://localhost/~astacey/wordpress"
::1 - - [21/Mar/2010:15:34:54 +0100] "GET /~astacey/wordpress/wp-admin/post.php?action=edit&post=7&message=6 HTTP/1.1" 200 41745 "http://localhost/~astacey/wordpress/wp-admin/post-new.php" "Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.9.1.8) Gecko/20100216 Fedora/3.5.8-1.fc11 Firefox/3.5.8"
Normally, you get the IP address at the start but since I was on the machine running the server, it didn't register. Then you get the time-stamp, the request, the HTTP sever code (200 good, 40X bad), Not sure what that next number is (time the request took, perhaps?). Then the referring page (if any). Finally, the user-agent string (so you can see that I claimed to be using Firefox on an old Mac from the US running linux. In fact, I appear to be running fedora 11 and Firefox 3.5.8. Actually, you haven't a clue what I'm actually using as user-agents are customisable by the browser.).
If you don't have access to the server logs, you may still be able to pick this up using a bit of nifty javascript. All of this information gets passed to any program and it may be accessible in the javascript. I'm not a js expert so I don't know if js has access to this, and if it does whether or not it can do anything with it. Certainly a server-side script could do it. What you would want is for the page to ajax-like call a program on the server passing it all the environment variables. That program can then log it wherever you want.
Seems a bit of an effort, though. Far simpler just to request the access logs.
]]>Lots of questions have been closed "because there is no point in allowing more answers" (not that I think that is a sensible reason!) and I don't think you want to disallow robots seeing those and people finding them.
]]>The obvious keywords I can think of for math homework help don't bring up MO, so I don't think Google's to blame.
]]>By the way, this prompts a feature request. Could closed questions be equipped with <meta name='robots' content='none'> in the head? Then at least typical homework questions won't by themselves lead googling students to MO.
]]>How are these people finding Math Overflow? Do we know? I understand what motivates them to ask their questions once they find the site -- there's a chance they may get an answer, and the worst that can happen is that some people on the Internet that they've never met tell them to go away -- but how are they getting here in the first place?
]]>