"Mapping from EDRM docid to SDOC number (stored in email headers of PST format collection) for all emails", "msg-sdoc.csv.bz2" => "Mapping from EDRM docid to SDOC number (stored in email headers of PST format collection) for canonical versions of deduplicated emails", "docids-v2.csv.bz2" => "Official list of docids that are candidates for production", "msg-uniqmsg.csv.bz2" => "Deduplication mapping from message docid to canonical version of message", "uniqmsg.csv.bz2" => "List of canonical IDs of unique messages", "DownloadPST.bat" => "Script to download PST version of EDRM Enron collection (obsolete)", "DownloadXML.bat" => "Script to download XML version of EDRM Enron collection (obsolete)", "edrmv2txt-v2.tar.bz2" => "A de-duplicated version of the text rendering of the EDRM Enron collection, containing only the canonical versions of emails and their attachments", "edrmv2nativeattach.tar.bz2" => "The deduplicated attachments in native format", "seed.csv" => "Seeds sets for the TREC 2010 Legal Track Learning Task" ) ?>
Name | Last Modified (UTC) | Size | Description |
---|---|---|---|
%s | \n", $file, $file); printf("%s | \n", date('Y-m-d\&\n\b\s\p\;H:i:s', filemtime($file))); $fsz = filesize($file); $gb = 1024 * 1024 * 1024; $mb = 1024 * 1024; $kb = 1024; if ($fsz > $gb) { $fsz_s = sprintf("%dG", $fsz/$gb); } else if ($fsz > $mb) { $fsz_s = sprintf("%dM", $fsz/$mb); } else if ($fsz > $kb) { $fsz_s = sprintf("%dK", $fsz/$kb); } else { $fsz_s = sprintf("%dB", $fsz); } printf("%s | \n", $fsz_s); printf("%s | \n", $desc); } ?>
These tools and data sets may help you to download the EDRM Enron Dataset v2 used by the TREC 2010 Legal Track.