Too many email address redirect to your gmail account? Watching your Spam folder getting bigger (and boring) everytime? It was for me.
So I decided to write some scripts (thanks to libgmailer) and now I can spend my time reading a newspaper or going out ;)
During the last years I've collected a lot of email addresses: one for each company I've worked for, one for msn, one for gmail, one for yahoo, and the list goes on. I finally decided to foward all my email address to a Gmail box. There's something about the conversation view that no other client has yet...Of course this move had the side effect of multiplying the spam emails I receive in the gmail box. The spam system does a good job, but I think some reports can be useful.
Several years ago I came up with the idea of using a domain name I have to create email address for each webpage I sign up and I don't trust much. So every email I create is something like thesiteurl@mydomain.com. I have all the domain emails redirected to my account. When I see some email address is sending too much spam I simply block the address and that's all.
Anyway it is still a pain in the a** to go through my spam folder every day to control this stuff. So I decided that a script should be doing it for me.
Basically the script connects to gmail, reads the spam folder, and saves certain data from the email to a database table which I later can review from a webpage with some cool effects. On the right you can see a screenshot of the reports. I've add some graphics to make it a little more interesting ;)
The script is written in php. It uses libgmailer (from the gmail-lite project). It reads the folder content for unread messages and when if found a conversation it read each email and saves the recepient, the sender and the subject for later analysis. Using adodb it inserts in a database table the data.
Let's see a bit of the collect.php code.
This scripts should be run from a cron job periodically (every hour should be fine). First, we have to create the table in the database where we are going to save the data the script will collect:
CREATE TABLE spam_occurance (
message_id VARCHAR(30) NOT NULL ,
ts INT(11) NOT NULL ,
recv_email VARCHAR(60) NOT NULL ,
subj VARCHAR(200) NOT NULL ,
from_email VARCHAR(60) NOT NULL ,
PRIMARY KEY (message_id),
INDEX (recv_email,ts),
INDEX (ts,recv_email)
);
Then we modify the config.inc.php and specify a database connection URI to access that table.
Now let's review some insteresting parts of the collect code:
Connect to gmail, apparently the library saves login cookies in a session var, so that it won't have to re-authenticate every time it runs... Anyway, I don't think that this works when running from the command line...
$gm = new GMailer();
$gm->setLoginInfo($myemail, $pwd, $tz);
Next, we get the spam box conversations and cycle in the unread conversations to get the real emails:
$gm->fetchBox(GM_STANDARD, "spam", 0);
$snapshot = $gm->getSnapshot(GM_STANDARD);
if ($snapshot) {
debug('Total # of conversations in Spam folder = ' . $snapshot->box_total.$nl.$nl);
foreach ($snapshot->box as $conv) {
// we will only inspect unread messages for better performance
if ($conv['is_read']==1) {
debug('Conversation "'.strip_tags($conv['subj']).'" (id: '.$conv['id'].')'.$nl);
// get the messages in the conversation
$q = "search=spam&view=cv&th=".$conv['id'];
$gm->fetch($q);
$snapshot2 = $gm->getSnapshot(GM_STANDARD | GM_LABEL| GM_QUERY| GM_CONVERSATION);
....
So now we have the real emails in the conversation in the $snapshot2 object. We only have to collect that email info, verify that it wasn't already analyzed/inserted (could happen when a new mail enters an already read conversation) and finally insert the data in the table:
foreach ($snapshot2->conv as $msg) {
debug(' From: '.$msg['sender_email'].' ID: '.$msg['id'].' was send to ');
foreach ($msg['recv_email'] as $recv_email){
$email = '';
eregi($regex, $recv_email, $email);
debug($email[0].' ');
$sql = 'select 1 from spam_occurance where message_id = ?';
$rs = $db->execute($sql,array($msg['id']));
if ($rs->EOF) {
$sql = 'insert into spam_occurance '.
'(message_id,ts,recv_email,subj,from_email) '.
' values (?,?,?,?,?)';
$db->execute($sql, array( $msg['id'],time(),$email[0],
strip_tags($conv['subj']),$msg['sender_email']));
debug('INSERTED');
} else {
debug('SKIPPED');
}
}
debug($nl);
}
Data is now in the database and ready to be seen by the report script.
The reports.php script is a little bit more complicated (with the ajax stuff) so just go ahead and download it to see it. Anyway the powerfull part was the use of libgmailer. It's a very good library that I could imagine using in a lot of ways!
You can get the source code here: 4TM Open Source Tools