TREC 2007 Legal Track
Interactive Challenge Task
March 18, 2007
Send comments to the mailing list or to oard@umd.edu
The principal goal of the TREC legal track is to support experimental
investigation of alternative designs for systems to support
"E-Discovery", the process by which evidence stored in digital form is
made available for use in litigation. Additional details on the
track are available at http://trec-legal.umiacs.umd.edu/.
In 2006, six research teams submitted "automatic runs," experiment
results that were created without human intervention. This involved
automatically indexing the collection, automatically generating
queries from the "topic descriptions" that were provided with the
collection, and automatically generating result sets (usually ranked
in an order approximating decreasing probability of relevance, as
estimated by the system). This process yields repeatable comparisons
between alternate system designs, but three factors limit the degree
to which experiment results are representative of real applications.
First, the process of automatic query generation is at best an
imperfect approximation of what a real person would do. Indeed,
approximating human behavior at this task is so difficult that it is
common practice for researchers who are interested principally in
system design to simply take all of the words from one or more fields
as the topic description as if all of those words had been typed by
the user as the query. Such an approach can yield useful comparisons
of system capabilities, although with some risk that compensating
behavior by real users that might tend to minimize differences in
practice remains unmodeled. A second important limitation of fully
automatic experiments is that they do not attempt to model query
refinement behavior, which both simulation studies and actual user
studies have repeatedly identified as an important factor in the
effective use of information retrieval systems. A third potential
limitation, less often remarked upon but potentially of greater
importance early in the development of new technology, is that the
form and content of the topic description reflects a set of
assumptions about system capabilities that may constrain the design
space that can be explored in this way. In the TREC-2006 legal track,
for example, the topic descriptions contained only natural language
terms. That decision, taken early in the design of the track,
naturally would have made it easier for teams to automate the
generation of queries containing natural language terms than it would
have been for them to generate queries containing the metadata terms
that (which are also present in the document collection).
For 2007, we are proposing an "interactive challenge task" in an
effort to begin to explore these issues. We have patterned this task
on a pilot effort that was conducted in 2006 in which a single
professional searcher sought to identify relevant documents for each
topic that automated systems would be unlikely to find. This effort
was successful, identifying an average of 35 relevant documents per
topic (over 39 topics) that were not highly ranked by any automatic
run. In addition to the (unsurprising) confirmation that people and
machines together can achieve more than machines alone could do,
identification of these additional relevant documents can help system
designers to focus some of their efforts on this set of documents that
have proven to be particularly challenging for present search
technologies.
Our design for the TREC 2007 Legal Track Interactive Challenge Task
differs from the 2006 pilot in three important ways: (1) searchers
will focus on overall recall (as in a real e-discovery task) rather
than just on documents that they expect automated systems would be
unlikely to find, (2) we will focus on a small number of topics so that
we can compare the results of different searchers who apply different
search strategies to the same task, and (3) we'll add an element of
competition so that we can have some fun with this!
Here's how things will work (all dates 2007):
- Teams can form in any way they like. We expect that some teams
will consist of students in legal informatics courses (perhaps as
part of a structured assignment or a more open-ended class project),
some teams might consist of search professionals at an e-discovery
firm, and some may in reality be system development groups who want
to try out query designs that are difficult to automate. The
minimum team size is one person who can devote 2 hours to the task.
There is no maximum limit for team size.
- There are 12 topics available, and teams may choose to work
with as many or as few of those as they like. All we ask is that
each team select topics in priority order. That way every team will
work on the highest priority topic, most teams will work on the two
highest priority topics, etc. This facilitates comparisons across
teams that can reveal the effect of different strategies, and it
gives us a basis for selecting a winning team. The topics are
available in two files:
Note that there are more topics in the zip file than are needed for
the interactive task. The topic priority list tells you which ones
to use.
- Teams can organize their efforts in any way they wish. For
example, some teams may assign different topics to different
searchers. Others may have searchers work in pairs, sharing ideas
(as might occur during training of new employees in an e-discovery
firm, for example), others might employ a quality assurance review
process in which experienced searchers review the query logs and
results from novices and augment that with additional searches when
needed, and others might try having many people searching the same
topic and then automatically voting on which documents to submit.
Searching can be performed using any system, and it can be done in
any location (e.g., all in one lab, or as a homework assignment).
The 2006 Assessor
Guide (Word). Further questions on interpretation should be
sent to Doug Oard (oard@umd.edu).
- Interactive task participants can team with system developers
and use their system if they wish, but we expect that most
interactive task participants will prefer to choose an existing
Web-accessible system. Three search systems are available:
- Legacy Tobacco
Documents Library Full Text Search (beta)
- This is a new search engine from the University of California
San Francisco Library that searches both OCR and metadata fields.
- Tobacco Documents Online
- This is the system used by the TREC Legal Track relevance
assessors. It searches both the OCR and metadata fields.
- Legacy
Tobacco Documents Library Metadata Search
- This is the original search engine from the University of
California San Francisco Library. It searches only the metadata
fields (no OCR search).
Each system has a different search interface. These systems are all
operated by third parties, so there is some chance that a search
service you planned to use might change in some way or (in extreme
cases) even become unavailable. If you notice a change in the
system you are using in the midst of your searches, just note that
on the questionnaire so that we can bear that in mind when
interpreting results.
- By August 1, each team will be asked to submit no more than 100
documents for each topic that they choose to work on. This time
deadline was selected to permit participation by students in summer
courses. The 100-document limit was chosen to make it possible to
perform the task within two hours. Submissions will be in the form
of one list of document identifiers (specifically, the "bates
number") for each topic that was used in an ordinary text file. A
questionnaire describing search
strategies and recording (anonymized) demographic characteristics
for each searcher and relevant details of their search (system used,
time invested, etc.) will be requested. Institutional Review Board
approval for this data collection will be completed and made
available on the TREC Legal Track Web site (this should simplify
local IRB approval at other institutions, but IRB requirements vary
and participants are responsible for obtaining any necessary local
approval for human subjects research). Runs will be accepted with
incomplete questionnaires is necessary, but only teams with complete
questionnaires can complete for the "best results" honor.
- Submissions will be evaluated by a team of experienced legal
professionals for relevance to the topic description, and their
judgments will be used to assign a score for each team based on
their success (in terms of relevant and non-relevant documents
submitted) for however many topics all comparable teams actually
submit. For example, if one team submits two topics but does poorly
on both, while three other teams each submit five topics, scores
will be compared for the five topics that are common across the
highly competitive teams. Teams will receive one point for each
relevant document submitted, and they will lose half a point for
each non-relevant document submitted. There may be fewer than 100
relevant documents for some topics, so the minimum possible score
for one topic is -50, while the maximum may be less that +100.
These results will be provided to participating teams by October 1.
Earlier "preview" results based on last year's relevance judgments can
be provided upon request; for the preview results, documents for
which no relevance judgments are available will receive zero points
(i.e., no benefit and no penalty).
- An award ceremony for the winning team will be conducted in
Gaithersburg Maryland in early November, 2007 at the Text Retrieval
Conference. Participation in this ceremony is optional, and travel
and registration costs will be the responsibility of the
participants or their sponsoring organization. The prize for the
winning team will be a secret right up to the time of the award!
Track participants will be invited to submit a paper describing
their work prior to the conference (which will be distributed only
to conference attendees) and to revise the paper for inclusion on
the TREC Web site following the conference. Participation in the
TREC conference, which includes presentation of results on many
tasks and planning for the subsequent year, is limited to TREC
participants, and participation in the interactive challenge task
satisfies that requirement.