Haystack group meeting notes, September 14, 2004

Progress reports and introductions

Jamie Teevan (PhD student) is working on HCI issues for how people return to
information (and on being a new mom). She is also organizing the HCI seminar for
the Fall.

David Huynh (PhD student) is finishing up a paper on his work at Microsoft last
summer (on visualization tools for large collections of photographs). This term,
he is looking at possible topics for his PhD dissertation. One possible topic
concerns how to make security more visible and understandable in web
applications (e.g., by detecting bogus sites that fish for personal information
by masquerading as other sites).

Marios Assiotis (new UROP) worked at Apple last summer on the user interface for
an internal Applie application.

Ryan Manuel (MEng student) is working on wrapper induction (finding patterns on
a web page). He is extending previous work by Andrew Hogue to consider links,
and not just text, on a page. He has also ported the wrapper induction code from
Internet Explorer to Mozilla, although there are still a few glitches when the
code is run under Linux.

Yuan Shen (PhD student) is working on how to recognize and extract records from
web pages (such as search results and directory listings) using algorithms that
look for similarities and repeated subtrees on a page. His canonical test page
is the CSAIL directory page. A current glitch in his algorithm causes it to
detect navigation bars as records.

Nick Matsakis (PhD student) is writing a proposal for his PhD thesis on
identifying object references. He has talked with members of his thesis
committee (Leslie Kaelbling and Randy Davis) and is reading about conditional
Markov random fields. He will give a short talk about his work at next week's

Zane Tian and Sumudu Watugala (new UROPs) worked with Dennis Quan last summer at
IBM to create prototype for connecting semantically tagged web services. They
developed an ontology for workflows by modifying the Haystack UI for
collections. It is not clear whether their code will be contributed to the open
source code base, or remain IBM internal.

Artem Gleyzer (UROP) worked this past summer on tools that users can employ to
annotate and export parts of the RDF graph. Instead of exporting a default
subgraph for each type of resource, his tools are designed to export
task-related subgraphs. David Karger suggests that the ability to save data from
Haystack (by exporting it to a file) makes it possible (and desirable) for group
members to begin using Haystack regularly for one or more of their normal
information tasks.

Harr Chen (new PhD student) has worked at Microsoft on extending web searches to
topic-specific searches and on providing server-based help (that uses
classifiers to train on click-through data).

Steve Garland (Principal Research Scientist) worked this past summer on
extracting a minimal (more accurately, a moderately sized) subset of haystack to
serve as a more manageable base for application development. He will talk about
this subset at next week's group meeting.

Amanda Smith (UROP) worked this past summer on a help ontology and system for
haystack. In her system, every object has a separate help view. Objects can be
annotated currently using the code editor; a later extension will make it
possible to annotate objects using the Haystack UI. This term, Amanda may leave
to Haystack group to work with Regina Barzilay.

Vineet Sinha (PhD) student is currently out of town.


Weekly group meetings will be held over lunch (provided) each Tuesday at noon in
32-G531. This year, the agenda for group meetings will be organized around two
or three short presentations by group members on what they have been doing, on
interesting papers they have read in the literature, etc. Steve and Nick will
talk at next week's meeting.

Instead of going around the table with status reports at each group meeting, we
will be using the Haystack Wiki, http://haystack-wiki.csail.mit.edu, to track
progress. Time will still be available at meetings, however, for people to
describe problems they have been encountering in their work and to solicit

The following weekly individual meeting times were scheduled with David Karger:
(Mo 4pm Harr, Tu 2pm Jaime, Tu 3pm Nick, We 4pm Yuan, Th 1pm David H, Th 2pm
