Add your suggestions for sites that it would be fun to scrape to this list:
[edit]
Places
[edit]
People
[edit]
Movies
[edit]
Books
- WorldCat.org
- Amazon.com (books)
[edit]
Music
- allofmp3
- Amazon.com (CD's) - done by DAN THE MAN
- MusicBrainz low hanging fruit as they have XML links (under details) and a XML webservice. They used to have an RDF dump but abandoned it for XML :-(
- Last.FM
Note: iTunes cannot be scraped because it's encrypted and only iTunes can get to it
[edit]
Food
- http://www.woochi.com/ (wine)
- http://www.allrecipes.com/
- http://www.ibiblio.org/oscookbook/index.htm - the open source cookbook. it's available as a pdf and a textfile.
- recipes??
[edit]
Libraries
- http://citeseer.ist.psu.edu/
- http://catalog.loc.gov/ MARC21 records available as xml. See this article for piggy bank references.
- http://www.wikipedia.org
- http://www.gmail.com
[edit]
Desktop
- Firefox file browsing (e.g., by getting file:/// pages' selection source and name, file-size, date, time, etc.)
[edit]

