Saturday, 5. January 2013Week 0
Tuesday, 1. January 2013Week 0

HAVP PhishTank and Adserver Blacklist

For basic virus protection I'm running a proxy with HAVP and ClamAV.
Since some time I was using HAVPs blacklist functionality to block Ads (by blacklisting *.doubleclick.net and *.ivwbox.de). As such a manual blacklist is not very efficient I wanted to have an auto-updating list of adservers, thus I started to write the shellscript below which generates an up-to-date blacklist based on the adserverlist from pgl.yoyo.org.

Shortly after this I extended the script to also incorporate a Phising blacklist based on the data from PhishTank.
Currently I'm using the version below which runs in a cronjob every two hours and keeps the HAVP blacklist up-to-date. Please note that you need to insert your own free PhishTank API key when using this script.

#!/bin/sh

cd /etc/havp

OUTFILE=/etc/havp/blacklist

ADSERVERLIST=/etc/havp/adserverlist
PHISHTANK=/etc/havp/phishtank
MYBLACKLIST=/etc/havp/myblacklist

wget -q -N "http://pgl.yoyo.org/adservers/serverlist.php?hostformat=webwasher;showintro=0;mimetype=plaintext"
sed -e 's_^//_#_g' serverlist.php* | sort | uniq > $ADSERVERLIST

wget -q -N http://data.phishtank.com/data/<PhishTank API key>/online-valid.csv.bz2
bzcat online-valid.csv.bz2 | sed \
	-e 's/^[0-9]*,//' \
	-e 's@,http://www.phishtank.com/phish_detail.php?phish_id=[0-9]*,.*$@@' \
	-e 's/^"\(.*\)"$/\1/'  \
	-e 's_^https\?://__' \
	-e 's_/$_/*_' \
	-e 's_^\([^/]*\)$_\1/*_' \
	-e 's/?.*/*/' | \
grep -vF 'phish_id,url,phish_detail_url,submission_time,verified,verification_time,online,target' | \
iconv -f utf8 -t ascii -c - | sort | uniq > $PHISHTANK


echo "# blacklist file generated by $0, `date`" > $OUTFILE

echo "\n# MYBLACKLIST:" >> $OUTFILE
cat $MYBLACKLIST >> $OUTFILE

echo "\n# ADSERVERLIST:" >> $OUTFILE
cat $ADSERVERLIST >> $OUTFILE

echo "\n# PHISHTANK:" >> $OUTFILE
cat $PHISHTANK >> $OUTFILE