RE: Piggy Bank, timeline visualization, and scrapers

From: Patrik Holmer <NNTP.psbh_at_telia.com>
Date: Wed, 19 Oct 2005 14:10:47 +0200

Rickard, regarding your passus on the Semantic Bank Server:

Have you managed to get a installation of a Semantic Bank Server running
without errors in your own OS-enviroment? If so, would it be possible for
you to provide a reference - i.e. the OS-platform, OS-version, Java-version
et al to this forum?

I need a starting point to narrow down the possibilities of my experienced
errors regarding missing Browse Data by Tag and Starting Points in the
default Semantic Bank installation - as provided in the Semantic Bank
Install-section.

// Patrik

-----Ursprungligt meddelande-----
Från: Rickard Öberg [mailto:rickard.oberg_at_senselogic.se]
Skickat: onsdag den 19 oktober 2005 09:47
Till: general_at_simile.mit.edu
Ämne: Re: Piggy Bank, timeline visualization, and scrapers

Hi!

I managed to figure out the flow of multi-page scrapers, but now that I try
it on a slightly larger test I run into memory problems. First of all, if
you are doing multi-page stuff (as in nr>1000) it appears to be a good idea
to do model.getRepository() for every page in order to flush the
transaction. Otherwise the process comes to a halt due to lack of memory.
Second, since the repository that the transaction is flushed to is also a
memory model (see WorkingModel.java) that would set a limit as to how much
data can be scraped in one go as well. For example, I want to do a scraper
that gets data from Wikipedia.org, and then we're not talking 1000 pages or
so, but much much more. In order to get that to work the data has to be
flushed down into a persistent store once in a while. Any way to do that?

Other than that I'm happy for now. It all works reasonably well (except I
can't get my Google Maps to work on the Semantic Bank server; do I have to
have an API key for that as well? If so, where do I put it?).

regards,
  Rickard
Received on Wed Oct 19 2005 - 12:05:24 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT