Name | Last Modified (UTC) | Size | Description |
---|---|---|---|
alldoc-sdoc.csv.bz2 | 2010-06-23 19:49:53 | 30M | Mapping from EDRM docid to SDOC number (stored in email headers of PST format collection) for all emails |
msg-sdoc.csv.bz2 | 2010-06-23 12:26:00 | 11M | Mapping from EDRM docid to SDOC number (stored in email headers of PST format collection) for canonical versions of deduplicated emails |
docids-v2.csv.bz2 | 2010-07-05 15:03:38 | 20M | Official list of docids that are candidates for production |
msg-uniqmsg.csv.bz2 | 2010-06-23 12:22:43 | 38M | Deduplication mapping from message docid to canonical version of message |
uniqmsg.csv.bz2 | 2010-06-23 12:15:27 | 10M | List of canonical IDs of unique messages |
DownloadPST.bat | 2010-07-12 14:17:23 | 11K | Script to download PST version of EDRM Enron collection (obsolete) |
DownloadXML.bat | 2010-06-22 15:55:03 | 11K | Script to download XML version of EDRM Enron collection (obsolete) |
edrmv2txt-v2.tar.bz2 | 2010-07-06 12:03:13 | 596M | A de-duplicated version of the text rendering of the EDRM Enron collection, containing only the canonical versions of emails and their attachments |
edrmv2nativeattach.tar.bz2 | 2013-03-27 14:35:01 | 8G | The deduplicated attachments in native format |
seed.csv | 2010-06-23 10:26:44 | 898K | Seeds sets for the TREC 2010 Legal Track Learning Task |
These tools and data sets may help you to download the EDRM Enron Dataset v2 used by the TREC 2010 Legal Track.