
Enron Dataset - Individual .PST Datasets
About this Resource
The resources available from this site
constitute some of the practical research completed for a dissertation project
at the
Downloads
The 148 individual mailboxes from the Enron CALO
dataset can be downloaded (in zip format).
The URL format for each mailbox is as follows:
http://enrondata.blob.core.windows.net/pst/pst/<mailbox_name>.zip
where <mailbox_name> is the custodian name
of the user.
For example:
http://enrondata.blob.core.windows.net/pst/pst/allen-p.zip
To view a complete list of Enron custodians please click here.
Note that the mailbox size is dependent on the
user.
Some downloads are very small
(<1MB) whilst some are much larger (150+MB).
Alternately, the complete dataset belonging to all 148 users may be downloaded from here (1.73GB)
Related Publications
Neil Cooke and
Lee Gillam (2008) "Distributions and
Distributional Lexical Semantics for Stop Words". Workshop on Corpus Profiling
for Information Retrieval and Natural Language Processing, in conjunction with
Information Interaction in Context (IIiX) 2008.
Lee Gillam and Neil Cooke (2008) "Intellectual property escaped with the email? Press F1 for help". Journal of Information Assurance and Security 3(1): 16-26, March 2008. Download paper from this link (PDF)
Neil Cooke,
Lee Gillam, and Ahmet Kondoz (2007) "IP Protection: Detecting
Email-based Breaches of Confidence". 3rd International Symposium on Information
Assurance and Security,
Neil Cooke,
Lee Gillam, and Ahmet
Kondoz (2007) "The Best Kept Secrets with Corpus Linguistics" 4th Corpus
Linguistics Conference 2007,
External Links
Department of Computing - University of Surrey