TREC 2007 Legal Track

Interactive Challenge Task

March 18, 2007
Send comments to the mailing list or to oard@umd.edu

The principal goal of the TREC legal track is to support experimental investigation of alternative designs for systems to support "E-Discovery", the process by which evidence stored in digital form is made available for use in litigation. Additional details on the track are available at http://trec-legal.umiacs.umd.edu/.

In 2006, six research teams submitted "automatic runs," experiment results that were created without human intervention. This involved automatically indexing the collection, automatically generating queries from the "topic descriptions" that were provided with the collection, and automatically generating result sets (usually ranked in an order approximating decreasing probability of relevance, as estimated by the system). This process yields repeatable comparisons between alternate system designs, but three factors limit the degree to which experiment results are representative of real applications. First, the process of automatic query generation is at best an imperfect approximation of what a real person would do. Indeed, approximating human behavior at this task is so difficult that it is common practice for researchers who are interested principally in system design to simply take all of the words from one or more fields as the topic description as if all of those words had been typed by the user as the query. Such an approach can yield useful comparisons of system capabilities, although with some risk that compensating behavior by real users that might tend to minimize differences in practice remains unmodeled. A second important limitation of fully automatic experiments is that they do not attempt to model query refinement behavior, which both simulation studies and actual user studies have repeatedly identified as an important factor in the effective use of information retrieval systems. A third potential limitation, less often remarked upon but potentially of greater importance early in the development of new technology, is that the form and content of the topic description reflects a set of assumptions about system capabilities that may constrain the design space that can be explored in this way. In the TREC-2006 legal track, for example, the topic descriptions contained only natural language terms. That decision, taken early in the design of the track, naturally would have made it easier for teams to automate the generation of queries containing natural language terms than it would have been for them to generate queries containing the metadata terms that (which are also present in the document collection).

For 2007, we are proposing an "interactive challenge task" in an effort to begin to explore these issues. We have patterned this task on a pilot effort that was conducted in 2006 in which a single professional searcher sought to identify relevant documents for each topic that automated systems would be unlikely to find. This effort was successful, identifying an average of 35 relevant documents per topic (over 39 topics) that were not highly ranked by any automatic run. In addition to the (unsurprising) confirmation that people and machines together can achieve more than machines alone could do, identification of these additional relevant documents can help system designers to focus some of their efforts on this set of documents that have proven to be particularly challenging for present search technologies.

Our design for the TREC 2007 Legal Track Interactive Challenge Task differs from the 2006 pilot in three important ways: (1) searchers will focus on overall recall (as in a real e-discovery task) rather than just on documents that they expect automated systems would be unlikely to find, (2) we will focus on a small number of topics so that we can compare the results of different searchers who apply different search strategies to the same task, and (3) we'll add an element of competition so that we can have some fun with this!

Here's how things will work (all dates 2007):

Teams can form in any way they like. We expect that some teams will consist of students in legal informatics courses (perhaps as part of a structured assignment or a more open-ended class project), some teams might consist of search professionals at an e-discovery firm, and some may in reality be system development groups who want to try out query designs that are difficult to automate. The minimum team size is one person who can devote 2 hours to the task. There is no maximum limit for team size.
There are 12 topics available, and teams may choose to work with as many or as few of those as they like. All we ask is that each team select topics in priority order. That way every team will work on the highest priority topic, most teams will work on the two highest priority topics, etc. This facilitates comparisons across teams that can reveal the effect of different strategies, and it gives us a basis for selecting a winning team. The topics are available in two files:
- Topic Priority List (Word)
- Complaints and Requests (zip file)
Note that there are more topics in the zip file than are needed for the interactive task. The topic priority list tells you which ones to use.
Teams can organize their efforts in any way they wish. For example, some teams may assign different topics to different searchers. Others may have searchers work in pairs, sharing ideas (as might occur during training of new employees in an e-discovery firm, for example), others might employ a quality assurance review process in which experienced searchers review the query logs and results from novices and augment that with additional searches when needed, and others might try having many people searching the same topic and then automatically voting on which documents to submit. Searching can be performed using any system, and it can be done in any location (e.g., all in one lab, or as a homework assignment). The 2006 Assessor Guide (Word). Further questions on interpretation should be sent to Doug Oard (oard@umd.edu).
Interactive task participants can team with system developers and use their system if they wish, but we expect that most interactive task participants will prefer to choose an existing Web-accessible system. Three search systems are available:

Legacy Tobacco Documents Library Full Text Search (beta)
This is a new search engine from the University of California San Francisco Library that searches both OCR and metadata fields.
Tobacco Documents Online
This is the system used by the TREC Legal Track relevance assessors. It searches both the OCR and metadata fields.
Legacy Tobacco Documents Library Metadata Search
This is the original search engine from the University of California San Francisco Library. It searches only the metadata fields (no OCR search).
Each system has a different search interface. These systems are all operated by third parties, so there is some chance that a search service you planned to use might change in some way or (in extreme cases) even become unavailable. If you notice a change in the system you are using in the midst of your searches, just note that on the questionnaire so that we can bear that in mind when interpreting results.
By August 1, each team will be asked to submit no more than 100 documents for each topic that they choose to work on. This time deadline was selected to permit participation by students in summer courses. The 100-document limit was chosen to make it possible to perform the task within two hours. Submissions will be in the form of one list of document identifiers (specifically, the "bates number") for each topic that was used in an ordinary text file. A questionnaire describing search strategies and recording (anonymized) demographic characteristics for each searcher and relevant details of their search (system used, time invested, etc.) will be requested. Institutional Review Board approval for this data collection will be completed and made available on the TREC Legal Track Web site (this should simplify local IRB approval at other institutions, but IRB requirements vary and participants are responsible for obtaining any necessary local approval for human subjects research). Runs will be accepted with incomplete questionnaires is necessary, but only teams with complete questionnaires can complete for the "best results" honor.
Submissions will be evaluated by a team of experienced legal professionals for relevance to the topic description, and their judgments will be used to assign a score for each team based on their success (in terms of relevant and non-relevant documents submitted) for however many topics all comparable teams actually submit. For example, if one team submits two topics but does poorly on both, while three other teams each submit five topics, scores will be compared for the five topics that are common across the highly competitive teams. Teams will receive one point for each relevant document submitted, and they will lose half a point for each non-relevant document submitted. There may be fewer than 100 relevant documents for some topics, so the minimum possible score for one topic is -50, while the maximum may be less that +100. These results will be provided to participating teams by October 1. Earlier "preview" results based on last year's relevance judgments can be provided upon request; for the preview results, documents for which no relevance judgments are available will receive zero points (i.e., no benefit and no penalty).
An award ceremony for the winning team will be conducted in Gaithersburg Maryland in early November, 2007 at the Text Retrieval Conference. Participation in this ceremony is optional, and travel and registration costs will be the responsibility of the participants or their sponsoring organization. The prize for the winning team will be a secret right up to the time of the award! Track participants will be invited to submit a paper describing their work prior to the conference (which will be distributed only to conference attendees) and to revise the paper for inclusion on the TREC Web site following the conference. Participation in the TREC conference, which includes presentation of results on many tasks and planning for the subsequent year, is limited to TREC participants, and participation in the interactive challenge task satisfies that requirement.