XSLT scraper help?

From: Jon Crump <jjcrump_at_myuw.net>
Date: Wed, 26 Apr 2006 12:18:51 -0700 (PDT)

Dear all,

I finally got the absolute minimum javascript screen scraper to work. I'm
finding the code for more complex scrapers pretty daunting, and in the
absence of a more extensive tutorial for Solvent, I thought I'd try an
xslt scraper since I'm a good deal more familiar with that language. To
have a look at an example, I downloaded David's csail directory scraper
and activated it. When I try to scrape the csail directory, I get a great
string of errors, all

Caused by: java.io.IOException: Pipe not connected

I'm accustomed to using saxon8 at the command line, or within Oxygen. I
gather that PB wants to use something else and I need a pipe connected to
it. Have I surmised the problem accurately? Can anyone tell me what I have
to fiddle with to get this to work? Is this an Apache thing?

Jon

Java console output follows:

11:30:22.033 [...orpus.Corpus] Warning: Internal error:
java.io.FileNotFoundException: /Users/jjc/Library/Application
Support/Firefox/Profiles/n7f6chgo.PBtesting/piggy-bank/temporary-sources/model1145987983563/database/namespaces.dat
(No such file or directory) on null at [-1,-1] (936831ms)
11:30:22.062 [...oraryProfile] java.io.IOException:
java.io.FileNotFoundException: /Users/jjc/Library/Application
Support/Firefox/Profiles/n7f6chgo.PBtesting/piggy-bank/temporary-sources/model1145987983563/database/namespaces.dat
(No such file or directory) (29ms)
network: Connecting http://www.csail.mit.edu/directory/directory.php with
proxy=DIRECT
network: Connecting
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd with proxy=DIRECT
network: Connecting http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent with
proxy=DIRECT
network: Connecting http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent with
proxy=DIRECT
network: Connecting http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent with
proxy=DIRECT
ERROR: 'Pipe not connected'

11:30:52.154 [...oraryProfile] javax.xml.transform.TransformerException:
java.io.IOException: Pipe not connected (30092ms)
javax.xml.transform.TransformerException: java.io.IOException: Pipe not
connected
         at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:650)
         at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:279)
         at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithXSLT(TemporaryProfile.java:690)
         at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithGRDDL(TemporaryProfile.java:617)
         at
edu.mit.simile.piggyBank.TemporaryProfile.scrape(TemporaryProfile.java:519)
         at
edu.mit.simile.piggyBank.TemporaryProfile.load(TemporaryProfile.java:363)
         at
edu.mit.simile.piggyBank.TemporaryProfile$LoadingThread.run(TemporaryProfile.java:143)
Caused by: java.io.IOException: Pipe not connected
         at
com.sun.org.apache.xml.internal.serializer.ToStream.endElement(ToStream.java:2011)
         at
com.sun.org.apache.xml.internal.serializer.ToXMLStream.endElement(ToXMLStream.java:468)
         at
com.sun.org.apache.xml.internal.serializer.ToUnknownStream.endElement(ToUnknownStream.java:331)
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$1()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$0()
         at GregorSamsa.applyTemplates()
         at GregorSamsa.transform()
         at
com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.transform(AbstractTranslet.java:594)
         at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:640)
         ... 6 more
---------
java.io.IOException: Pipe not connected
         at
com.sun.org.apache.xml.internal.serializer.ToStream.endElement(ToStream.java:2011)
         at
com.sun.org.apache.xml.internal.serializer.ToXMLStream.endElement(ToXMLStream.java:468)
         at
com.sun.org.apache.xml.internal.serializer.ToUnknownStream.endElement(ToUnknownStream.java:331)
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$1()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$0()
         at GregorSamsa.applyTemplates()
         at GregorSamsa.transform()
         at
com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.transform(AbstractTranslet.java:594)
         at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:640)
         at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:279)
         at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithXSLT(TemporaryProfile.java:690)
         at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithGRDDL(TemporaryProfile.java:617)
         at
edu.mit.simile.piggyBank.TemporaryProfile.scrape(TemporaryProfile.java:519)
         at
edu.mit.simile.piggyBank.TemporaryProfile.load(TemporaryProfile.java:363)
         at
edu.mit.simile.piggyBank.TemporaryProfile$LoadingThread.run(TemporaryProfile.java:143)
---------
java.io.IOException: Pipe not connected
         at java.io.PipedOutputStream.write(PipedOutputStream.java:120)
         at
com.sun.org.apache.xml.internal.serializer.WriterToUTF8Buffered.flushBuffer(WriterToUTF8Buffered.java:382)
         at
com.sun.org.apache.xml.internal.serializer.WriterToUTF8Buffered.write(WriterToUTF8Buffered.java:309)
         at
com.sun.org.apache.xml.internal.serializer.ToStream.endElement(ToStream.java:2005)
         at
com.sun.org.apache.xml.internal.serializer.ToXMLStream.endElement(ToXMLStream.java:468)
         at
com.sun.org.apache.xml.internal.serializer.ToUnknownStream.endElement(ToUnknownStream.java:331)
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$1()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$3()
         at GregorSamsa.applyTemplates()
         at
GregorSamsa.http$colon$$slash$$slash$www$dot$w3$dot$org$slash$1999$slash$xhtml$colon$template$dot$0()
         at GregorSamsa.applyTemplates()
         at GregorSamsa.transform()
         at
com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet.transform(AbstractTranslet.java:594)
         at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:640)
         at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:279)
         at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithXSLT(TemporaryProfile.java:690)
         at
edu.mit.simile.piggyBank.TemporaryProfile.scrapeWithGRDDL(TemporaryProfile.java:617)
         at
edu.mit.simile.piggyBank.TemporaryProfile.scrape(TemporaryProfile.java:519)
         at
edu.mit.simile.piggyBank.TemporaryProfile.load(TemporaryProfile.java:363)
         at
edu.mit.simile.piggyBank.TemporaryProfile$LoadingThread.run(TemporaryProfile.java:143)
Received on Wed Apr 26 2006 - 19:17:48 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT