"Mapping from EDRM docid to SDOC number (stored in email headers of PST format collection) for all emails", "msg-sdoc.csv.bz2" => "Mapping from EDRM docid to SDOC number (stored in email headers of PST format collection) for canonical versions of deduplicated emails", "docids-v2.csv.bz2" => "Official list of docids that are candidates for production", "msg-uniqmsg.csv.bz2" => "Deduplication mapping from message docid to canonical version of message", "uniqmsg.csv.bz2" => "List of canonical IDs of unique messages", "DownloadPST.bat" => "Script to download PST version of EDRM Enron collection (obsolete)", "DownloadXML.bat" => "Script to download XML version of EDRM Enron collection (obsolete)", "edrmv2txt-v2.tar.bz2" => "A de-duplicated version of the text rendering of the EDRM Enron collection, containing only the canonical versions of emails and their attachments", "edrmv2nativeattach.tar.bz2" => "The deduplicated attachments in native format", "seed.csv" => "Seeds sets for the TREC 2010 Legal Track Learning Task" ) ?> Identification and download helpers for EDRM Enron v2 dataset

Identification and download helpers for EDRM Enron v2 dataset


$desc) { printf("\n"); printf(" \n", $file, $file); printf(" \n", date('Y-m-d\&\n\b\s\p\;H:i:s', filemtime($file))); $fsz = filesize($file); $gb = 1024 * 1024 * 1024; $mb = 1024 * 1024; $kb = 1024; if ($fsz > $gb) { $fsz_s = sprintf("%dG", $fsz/$gb); } else if ($fsz > $mb) { $fsz_s = sprintf("%dM", $fsz/$mb); } else if ($fsz > $kb) { $fsz_s = sprintf("%dK", $fsz/$kb); } else { $fsz_s = sprintf("%dB", $fsz); } printf(" \n", $fsz_s); printf(" \n", $desc); } ?>
Name Last Modified (UTC) Size Description
%s%s %s%s

These tools and data sets may help you to download the EDRM Enron Dataset v2 used by the TREC 2010 Legal Track.

For PST File Format Users ...