• Welcome! The TrekBBS is the number one place to chat about Star Trek with like-minded fans.
    If you are not already a member then please register an account and join in the discussion!

Gmail Help

acappellasaurus

Fleet Captain
Fleet Captain
So I have a weird scenario that I need some advice on from those more knowledgeable that I am. I work for a company that has been subpoenaed, and we are to provide all emails available for the last 10 years that contain a specific list of words. Right now, the only option I can see is to log on to each user's email, search the word requested, and then individually print to pdf each email that comes up. I have a very short time to do this, and have to do it on every available user account - approx 85 people. I just pulled up a sample user, and on the first search word alone he has 485 emails. Is there any better way to accomplish this? I am desperate.
 
Does your company pay Google for their email service? If so, they should have additional access methods and export capabilities.

If nothing else, you can configure Gmail for POP access and use a standard email client (Outlook, Thunderbird, etc.) to download all the emails and search them programmatically.
 
Our company's scenario is that we were grandfathered in to the free company email setup because we adopted it before they charged for it. So we can use it business-user style, but don't get any of the extra features that would help with this.

Here's what I did do - tell me if you think this is a workable solution...Gmail does have a way for you to download your entire email history, presumably for moving it to a different service. It archives everything you have and then spits the email out into a format called mbox. So I am doing that on all 85 users in the company. Next, I found a utility that converts mbox files to pdf format. We'll run every user's gmail archived mbox file through that to produce a massive amount of pdf files...basically one pdf for every email they have. Those pdfs will be collected in a folder. Finally, we'll run a crawler program over that file to search for the terms listed in the subpoena. This should produce only the relevant emails.

Does this sound like a way to go?
 
Converting to PDF seems unnecessary. The mbox file should be plain text. If not, it may be compressed (most likely gzipped.) From there, you can programmatically search it. There should be markers within the file that designate the beginning and ending of each message. This is a pretty straightforward text analysis problem. A scripting language like Python, Ruby, or Perl would be perfectly suited to extracting the messages you want from it.
 
If you are not already a member then please register an account and join in the discussion!

Sign up / Register


Back
Top