Last active
October 20, 2016 16:39
-
-
Save josephby/864008bffeba1f2a039cb4518d7ef44a to your computer and use it in GitHub Desktop.
bash one-liner to dump all URLs from an email newsletter
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#/bin/bash | |
# urls-from-email | |
# | |
# bash script to dump all URLs from an email | |
# | |
# To use this, save the "Raw Message Source" of an email to a filename (e.g. file.txt) and then run | |
# | |
# ./urls-from-email.sh file.txt | |
# | |
# It will then output every link in the email, one per line, excluding emails from links to twitter.com, getrevue.co or facebook.com | |
# | |
# You can have Google Chrome open all of these links in one shot to do with as you will, e.g. | |
# | |
# ./urls-from-email.sh file.txt > links.txt | |
# cat links.txt | while IFS= read -r line; do { /usr/bin/open -a "/Applications/Google Chrome.app" $line } ; done | |
# | |
cat $1 | perl -MMIME::QuotedPrint -pe '$_=MIME::QuotedPrint::decode($_);' | grep -Eo 'href="[^\"]+"' | grep -Eo '(http|https)://[^"]+' | while IFS= read -r line ; do { curl -Ls -o /dev/null -w %{url_effective} $line; printf "\n" ;} ; done | sed -E '/twitter.com|getrevue.co|facebook.com|list-manage.com|list-manage2.com|campaign-archive(1|2).com/d' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment