Identification and download helpers for EDRM Enron v2 dataset


Name Last Modified (UTC) Size Description
alldoc-sdoc.csv.bz2 2010-06-23 19:49:53 30M Mapping from EDRM docid to SDOC number (stored in email headers of PST format collection) for all emails
msg-sdoc.csv.bz2 2010-06-23 12:26:00 11M Mapping from EDRM docid to SDOC number (stored in email headers of PST format collection) for canonical versions of deduplicated emails
docids-v2.csv.bz2 2010-07-05 15:03:38 20M Official list of docids that are candidates for production
msg-uniqmsg.csv.bz2 2010-06-23 12:22:43 38M Deduplication mapping from message docid to canonical version of message
uniqmsg.csv.bz2 2010-06-23 12:15:27 10M List of canonical IDs of unique messages
DownloadPST.bat 2010-07-12 14:17:23 11K Script to download PST version of EDRM Enron collection (obsolete)
DownloadXML.bat 2010-06-22 15:55:03 11K Script to download XML version of EDRM Enron collection (obsolete)
edrmv2txt-v2.tar.bz2 2010-07-06 12:03:13 596M A de-duplicated version of the text rendering of the EDRM Enron collection, containing only the canonical versions of emails and their attachments
edrmv2nativeattach.tar.bz2 2013-03-27 14:35:01 8G The deduplicated attachments in native format
seed.csv 2010-06-23 10:26:44 898K Seeds sets for the TREC 2010 Legal Track Learning Task

These tools and data sets may help you to download the EDRM Enron Dataset v2 used by the TREC 2010 Legal Track.

For PST File Format Users ...