Re: a bibtex file that I cannot load into citeline from Stefano Mazzocchi on 2008-08-21 (stdin)

From: Stefano Mazzocchi <stefanom_at_MIT.EDU>
Date: Thu, 21 Aug 2008 13:15:26 -0400

Stefano Mazzocchi wrote:
> Enrico Silterra wrote:
>> I don't actually use bibtex -- so this problem is probably from
>> something mucked up
>> in the bib file produced by refworks.
>> Anyway, I don't actually get any errors, but I cannot load this file
>> of about 680 references.
>> http://libdev.library.cornell.edu/~es287/refshare/c2.bib
>> this is a file where every character over 127 has been deleted --
>> it was created from
>> http://libdev.library.cornell.edu/~es287/refshare/candl.bib --
>> which I assume could not be loaded because it contains ISO-LATIN1
>> instead of utf-8.
>>
>> Anyway, I would appreciate some guidance as to how to get this cite-lined.
>> Rick
>
> Enrico,
>
> congrats, you managed to hit 3 bugs with one single attempt :-)
>
> First of all, you're correct assuming that non-ASCII chars cannot be in
> BibTeX (they have to be escaped in using TeX escape codes) and the
> parser will fail if they are present. Since this is a very common error
> among BibTeX exporters, we did think of creating a 'pre-processor' in
> citeline that would do this kind of transformation for you... but this
> works only and exclusively if the encoding used in the various strings
> is uniform and this is hardly the case.
>
> Second, the BibTeX entries generated here appear ill formed, for example:
>
> _at_book{RefWorks:1775,
> author={Thomas M. Achenbach},
> year={1991},
> title={Manual for the Child Behavior Checklist/4-18 and 1991 Profile},
> publisher={University of Vermont, Department of Psychiatry},
> address={Burlington, VT},
> note={Actual Instrument Common citation}
> }
> }
>
> note how the curly brackets need to match and there is an additional one
> at the end of the item! Our bibtex parser doesn't really like that.
>
> So I went ahead replaced all instances of "}\n}" with "}\n" and I got it
> to parse.

Scratch that, I spoke too soon.

It turns out that our parser doesn't like that but our "spurious text
cleanup" pre-processor doesn't mind... basically it was designed to
strip out any text between BibTeX entries and it does! it considers
those additional } as comments and removes them automatically.

So, just change all occurrences of the "comment" string into something
else (I used "_comment") and it works fine (although a bit slow, 680
publications won't make it a speedy loading for exhibit).

Hope this helps.

-- 
Stefano Mazzocchi
Digital Libraries Research Group                 Research Scientist
Massachusetts Institute of Technology
E25-131, 77 Massachusetts Ave               skype: stefanomazzocchi
Cambridge, MA  02139-4307, USA         email: stefanom at mit . edu
-------------------------------------------------------------------

Received on Thu Aug 21 2008 - 13:15:26 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:43:53 EDT