Add your suggestions for sites that it would be fun to scrape to this list:

Places

People

Movies

Books

Music

  • allofmp3
  • Amazon.com (CD's) - done by DAN THE MAN
  • MusicBrainz low hanging fruit as they have XML links (under details) and a XML webservice. They used to have an RDF dump but abandoned it for XML :-(
  • Last.FM

Note: iTunes cannot be scraped because it's encrypted and only iTunes can get to it

Food

Libraries

Desktop

  • Firefox file browsing (e.g., by getting file:/// pages' selection source and name, file-size, date, time, etc.)

See also