Re: Piggy bank ports from David Huynh on 2005-12-19 (stdin)

From: David Huynh <dfhuynh_at_csail.mit.edu>
Date: Mon, 19 Dec 2005 18:26:10 -0500

That's quite odd since Piggy Bank seems to be able to parse the
scrapers' metadata themselves and that requires going through your
firewall/proxy. It's the step of retrieving the Javascript code that
fails. Do you have Piggy Bank's and Longwell's code checked out? Could
you take a look in the class edu.mit.simile.piggyBank.GRDDLModel,
function getCode(...)? I think that call to new URL(url).openStream()
fails. But the function that loads the scrapers' metadata and succeeds
is edu.mit.simile.SimileUtilities.loadDataFromURL(...) (in the Longwell
codebase).

David

Prokopp, Christian wrote:
> Hi David,
>
> Thank you for your answer. As mentioned in a different post to my
> question I already checked
> http://simile.mit.edu/issues/browse/PIGGYBANK-51 and it did not resolve
> my problem. I do not think it is a java/piggy-bank problem itself but
> rather my restrictive firewall/proxy I have to deal with. My question is
> to investigate the problem to maybe find a work around.
>
> Example:
> 1.) After I installed the scraper for jobsearch.monster.com I open the
> website:
> http://jobsearch.monster.com/jobsearch.asp?&q=java&re=112&refine=1
> 2.) I then click on the piggy bank icon to scrape the page and get a
> message like "Piggy bank will need to retrieve code from
> http://people.csail.mit.edu/people/dfhuynh/research/download/screen-scra
> pers/monster-com-search-scraper.js This might take a bit of time."
> 3.) It follows a redirect to my localhost piggy bank with a screen:
> "Monster - Search Jobs
> Collected Information
>
> No typed data found."
>
> This basically happens with every scraper I used from the SIMILE site.
> My question therefore is how the piggy bank retrieves the js code and
> any other possible data? I would have guessed plain http but maybe not?
> I also add the relevant output from the java console at the end of this
> email. Please excuse the long post.
>
> Cheers,
> Christian
>
Received on Mon Dec 19 2005 - 23:19:17 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT