Dear all,
I finally got the absolute minimum javascript screen scraper to work. I'm
finding the code for more complex scrapers pretty daunting, and in the
absence of a more extensive tutorial for Solvent, I thought I'd try an
xslt scraper since I'm a good deal more familiar with that language. To
have a look at an example, I downloaded David's csail directory scraper
and activated it. When I try to scrape the csail directory, I get a great
string of errors, all
Caused by: java.io.IOException: Pipe not connected
I'm accustomed to using saxon8 at the command line, or within Oxygen. I
gather that PB wants to use something else and I need a pipe connected to
it. Have I surmised the problem accurately? Can anyone tell me what I have
to fiddle with to get this to work? Is this an Apache thing?
Jon
Java console output follows:
11:30:22.033 [...orpus.Corpus] Warning: Internal error:
java.io.FileNotFoundException: /Users/jjc/Library/Application
Support/Firefox/Profiles/n7f6chgo.PBtesting/piggy-bank/temporary-sources/model1145987983563/database/namespaces.dat
(No such file or directory) on null at [-1,-1] (936831ms)
11:30:22.062 [...oraryProfile] java.io.IOException:
java.io.FileNotFoundException: /Users/jjc/Library/Application
Support/Firefox/Profiles/n7f6chgo.PBtesting/piggy-bank/temporary-sources/model1145987983563/database/namespaces.dat
(No such file or directory) (29ms)
network: Connecting
http://www.csail.mit.edu/directory/directory.php with
proxy=DIRECT
network: Connecting
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd with proxy=DIRECT
network: Connecting
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent with
proxy=DIRECT
network: Connecting
http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent with
proxy=DIRECT
network: Connecting
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent with
proxy=DIRECT
ERROR: 'Pipe not connected'
11:30:52.154 [...oraryProfile] javax.xml.transform.TransformerException:
java.io.IOException: Pipe not connected (30092ms)
javax.xml.transform.TransformerException: java.io.IOException: Pipe not
connected
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:650)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:279)
at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithXSLT(TemporaryProfile.java:690)
at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithGRDDL(TemporaryProfile.java:617)
at
edu.mit.simile.piggyBank.TemporaryProfile.scrape(TemporaryProfile.java:519)
at
edu.mit.simile.piggyBank.TemporaryProfile.load(TemporaryProfile.java:363)
at
edu.mit.simile.piggyBank.TemporaryProfile$LoadingThread.run(TemporaryProfile.java:143)
Caused by: java.io.IOException: Pipe not connected
at
com.sun.org.apache.xml.internal.serializer.ToStream.endElement(ToStream.java:2011)
at
com.sun.org.apache.xml.internal.serializer.ToXMLStream.endElement(ToXMLStream.java:468)
at
com.sun.org.apache.xml.internal.serializer.ToUnknownStream.endElement(ToUnknownStream.java:331)
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$1()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$0()
at GregorSamsa.applyTemplates()
at GregorSamsa.transform()
at
com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.transform(AbstractTranslet.java:594)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:640)
... 6 more
---------
java.io.IOException: Pipe not connected
at
com.sun.org.apache.xml.internal.serializer.ToStream.endElement(ToStream.java:2011)
at
com.sun.org.apache.xml.internal.serializer.ToXMLStream.endElement(ToXMLStream.java:468)
at
com.sun.org.apache.xml.internal.serializer.ToUnknownStream.endElement(ToUnknownStream.java:331)
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$1()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$0()
at GregorSamsa.applyTemplates()
at GregorSamsa.transform()
at
com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.transform(AbstractTranslet.java:594)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:640)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:279)
at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithXSLT(TemporaryProfile.java:690)
at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithGRDDL(TemporaryProfile.java:617)
at
edu.mit.simile.piggyBank.TemporaryProfile.scrape(TemporaryProfile.java:519)
at
edu.mit.simile.piggyBank.TemporaryProfile.load(TemporaryProfile.java:363)
at
edu.mit.simile.piggyBank.TemporaryProfile$LoadingThread.run(TemporaryProfile.java:143)
---------
java.io.IOException: Pipe not connected
at java.io.PipedOutputStream.write(PipedOutputStream.java:120)
at
com.sun.org.apache.xml.internal.serializer.WriterToUTF8Buffered.flushBuffer(WriterToUTF8Buffered.java:382)
at
com.sun.org.apache.xml.internal.serializer.WriterToUTF8Buffered.write(WriterToUTF8Buffered.java:309)
at
com.sun.org.apache.xml.internal.serializer.ToStream.endElement(ToStream.java:2005)
at
com.sun.org.apache.xml.internal.serializer.ToXMLStream.endElement(ToXMLStream.java:468)
at
com.sun.org.apache.xml.internal.serializer.ToUnknownStream.endElement(ToUnknownStream.java:331)
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$1()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
at GregorSamsa.applyTemplates()
at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$0()
at GregorSamsa.applyTemplates()
at GregorSamsa.transform()
at
com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.transform(AbstractTranslet.java:594)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:640)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:279)
at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithXSLT(TemporaryProfile.java:690)
at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithGRDDL(TemporaryProfile.java:617)
at
edu.mit.simile.piggyBank.TemporaryProfile.scrape(TemporaryProfile.java:519)
at
edu.mit.simile.piggyBank.TemporaryProfile.load(TemporaryProfile.java:363)
at
edu.mit.simile.piggyBank.TemporaryProfile$LoadingThread.run(TemporaryProfile.java:143)
Received on Wed Apr 26 2006 - 19:17:48 EDT