tea.mathoverflow.net - Discussion Feed (The impermance of images on MO) 2018-11-04T13:50:43-08:00 http://mathoverflow.tqft.net/ Lussumo Vanilla & Feed Publisher WillieWong comments on "The impermance of images on MO" (15035) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=15035#Comment_15035 2011-07-18T04:14:30-07:00 2018-11-04T13:50:43-08:00 WillieWong http://mathoverflow.tqft.net/account/288/ @Joseph: I'm sure this is one of the things that if we ask for it, they can do it. Provided the link in question is not already dead... @Joseph: I'm sure this is one of the things that if we ask for it, they can do it. Provided the link in question is not already dead...

]]>
Joseph O'Rourke comments on "The impermance of images on MO" (15026) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=15026#Comment_15026 2011-07-17T13:03:47-07:00 2018-11-04T13:50:43-08:00 Joseph O'Rourke http://mathoverflow.tqft.net/account/240/ I hesitate to put this in the "Migrate to SE2.0" thread, because it is a very minor issue. But, if we do migrate, would all the images in all the past MO posts be copied and stored ... Scott Morrison comments on "The impermance of images on MO" (14401) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14401#Comment_14401 2011-05-04T01:05:25-07:00 2018-11-04T13:50:43-08:00 Scott Morrison http://mathoverflow.tqft.net/account/3/ This is all an artefact of how the various database tables are actually used in the underlying software (well, based on our limited understanding of that software!) The database table that posts.xml ... This is all an artefact of how the various database tables are actually used in the underlying software (well, based on our limited understanding of that software!) The database table that posts.xml comes from contains all the information required to actually render the pages corresponding to questions, and in particular only needs to htmlified content, while the actual source, which is much more rarely needed, is stored in a separate database table, from which posthistory.xml is generated.

]]>
Anton Geraschenko comments on "The impermance of images on MO" (14398) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14398#Comment_14398 2011-05-03T23:04:30-07:00 2018-11-04T13:50:43-08:00 Anton Geraschenko http://mathoverflow.tqft.net/account/2/ @Mariano: posts.xml contains the htmlified versions of the posts, but posthistory.xml contains the markdown source. @Mariano: posts.xml contains the htmlified versions of the posts, but posthistory.xml contains the markdown source.

]]>
Mariano comments on "The impermance of images on MO" (14396) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14396#Comment_14396 2011-05-03T18:33:28-07:00 2018-11-04T13:50:43-08:00 Mariano http://mathoverflow.tqft.net/account/61/ Scott, so the dump is not really a dump but the result of htmlifying the markdown source? Scott, so the dump is not really a dump but the result of htmlifying the markdown source?

]]>
Joseph O'Rourke comments on "The impermance of images on MO" (14393) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14393#Comment_14393 2011-05-03T14:17:40-07:00 2018-11-04T13:50:43-08:00 Joseph O'Rourke http://mathoverflow.tqft.net/account/240/ Thanks, Scott, I have never looked at posts.xml, and I shouldn't have made any remark from ignorance. Mariano, my apologies for the wild goose chase! Scott Morrison comments on "The impermance of images on MO" (14389) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14389#Comment_14389 2011-05-03T13:25:33-07:00 2018-11-04T13:50:43-08:00 Scott Morrison http://mathoverflow.tqft.net/account/3/ @Joseph, actually, by looking at posts.xml we're actually looking at the final rendered HTML for posts. That is, the markdown syntax for including images that you mention has already been converted ... @Joseph,

actually, by looking at posts.xml we're actually looking at the final rendered HTML for posts. That is, the markdown syntax for including images that you mention has already been converted to standard HTML <img/> tags.

]]>
Mariano comments on "The impermance of images on MO" (14388) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14388#Comment_14388 2011-05-03T12:01:39-07:00 2018-11-04T13:50:43-08:00 Mariano http://mathoverflow.tqft.net/account/61/ Grepping the file for the string ![ gives me 9 occurrences, which are not links. Either I am not escaping the pattern correctly (I never remember what to excape when using what tool :/ ) or the links ... Grepping the file for the string ![ gives me 9 occurrences, which are not links. Either I am not escaping the pattern correctly (I never remember what to excape when using what tool :/ ) or the links are stored in one format and presented to the user (when editing, say) in another.

]]>
Joseph O'Rourke comments on "The impermance of images on MO" (14385) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14385#Comment_14385 2011-05-03T03:32:14-07:00 2018-11-04T13:50:43-08:00 Joseph O'Rourke http://mathoverflow.tqft.net/account/240/ @Mariano: Note that many, if not most images, are included via this syntax:![alt text][1] .... [1]: URL. (I tried to embed a real example here but then it displayed the image!) ![alt text][1] .... [1]: URL. (I tried to embed a real example here but then it displayed the image!)]]> Mariano comments on "The impermance of images on MO" (14384) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14384#Comment_14384 2011-05-02T23:12:56-07:00 2018-11-04T13:50:43-08:00 Mariano http://mathoverflow.tqft.net/account/61/ A low tech solution is: grep -o "&amp;lt;img src=&amp;quot;[^&amp;]*&amp;quot" posts.xml | sed -e "s/.*&amp;quot;\(.*\.\(png\|gif\|jpg\)\)&amp;quot/\1/" | ... A low tech solution is:

grep -o "&lt;img src=&quot;[^&]*&quot" posts.xml | sed -e "s/.*&quot;\(.*\.\(png\|gif\|jpg\)\)&quot/\1/" | xargs -1 wget

LATER: In fact, quite a few of the img tags are to latex.mathoverflow.net, which one does not want, so

grep -o "&lt;img src=&quot;[^&]*&quot" posts.xml 
    | sed -e '/latex.mathoverflow.net/d' -e 's/&lt;img src=*&quot;\(.*\)&quot/\1/' 
    | xargs -n 1 wget

is a better alternative.

By the way, with the last dump

grep -o "&lt;img src=&quot;[^&]*&quot" posts.xml 
    | sed -e '/latex.mathoverflow.net/d' -e 's/&lt;img src=*&quot;\(.*\)&quot/\1/' 
    | xargs -n 1 HEAD -d -t 3
    | sort 
    | uniq -c

(which uses a short timeout) returns

    505 200 OK
     40 204 No Content
     55 403 Forbidden
     19 404 Not Found
      1 404 NOT FOUND
      3 405 Method Not Allowed
      4 500 Can't connect to cs.smith.edu:80 (connect: timeout)
      1 500 Can't connect to img843.imageshack.us:80 (connect: timeout)
      1 500 Can't connect to math.huji.ac.il:80 (connect: timeout)
      3 500 Can't connect to maven.smith.edu:80 (connect: timeout)
      1 500 Can't connect to upload.wikimedia.org:80 (connect: timeout)
      5 500 Can't connect to www.freeimagehosting.net:80 (connect: timeout)
      2 500 Can't connect to www.math.hawaii.edu:80 (connect: timeout)
      1 500 Can't connect to www.maths.ed.ac.uk:80 (connect: Connection refused)
     22 500 read timeout
      5 501 Protocol scheme 'https' is not supported (Crypt::SSLeay or IO::Socket::SSL not installed)

(

]]>
Scott Morrison comments on "The impermance of images on MO" (14383) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14383#Comment_14383 2011-05-02T22:47:50-07:00 2018-11-04T13:50:43-08:00 Scott Morrison http://mathoverflow.tqft.net/account/3/ If you have posts.xml from the database dump, the following command grep -o "&amp;lt;a href=&amp;quot;[^&amp;]*&amp;quot" &lt; posts.xml | sed -e "s/&amp;lt;a ... If you have posts.xml from the database dump, the following command

grep -o "&lt;a href=&quot;[^&]*&quot" < posts.xml | sed -e "s/&lt;a href=&quot;\(.*\)&quot/\1/"

will give you a list of all links. (Sorry, my bash scripting doesn't extend to awk, or whatever one is really meant to use here.) After that you'd want to choose things likely to be images, and download them. The miracle still comes later.

]]>
Mariano comments on "The impermance of images on MO" (14382) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14382#Comment_14382 2011-05-02T22:01:02-07:00 2018-11-04T13:50:43-08:00 Mariano http://mathoverflow.tqft.net/account/61/ Maybe some enterprising soul could periodically scan the modump, search for links to images, download them somewhere stable and (here the unlikely magic occurs...) edit the database with links to the ... Maybe some enterprising soul could periodically scan the modump, search for links to images, download them somewhere stable and (here the unlikely magic occurs...) edit the database with links to the new copy.

]]>
Scott Morrison comments on "The impermance of images on MO" (14380) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14380#Comment_14380 2011-05-02T21:34:41-07:00 2018-11-04T13:50:43-08:00 Scott Morrison http://mathoverflow.tqft.net/account/3/ Duly noted, hehe. Of course the usual applies --- we have no control over the software we run, and, as discussed on another thread here, migrating to SE 2.0 looks unlikely for now. Duly noted, hehe.

Of course the usual applies --- we have no control over the software we run, and, as discussed on another thread here, migrating to SE 2.0 looks unlikely for now.

]]>
Mariano comments on "The impermance of images on MO" (14378) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14378#Comment_14378 2011-05-02T19:36:40-07:00 2018-11-04T13:50:43-08:00 Mariano http://mathoverflow.tqft.net/account/61/ +1 +1

]]>
Joseph O'Rourke comments on "The impermance of images on MO" (14377) http://mathoverflow.tqft.net/discussion/1035/the-impermance-of-images-on-mo/?Focus=14377#Comment_14377 2011-05-02T18:05:28-07:00 2018-11-04T13:50:43-08:00 Joseph O'Rourke http://mathoverflow.tqft.net/account/240/ Many questions and answers contain images. Unlike in StackExchange 2.0, the images in MO are links via URLs to various web servers scattered over the globe. (In SE2.0, the images are uploaded to ...
So I am wondering if at some point MO should capture and store the images users post? Otherwise MO does not retain a complete record of what has been posted.]]>