Skip to content

Instantly share code, notes, and snippets.

@fanyeren
Created September 26, 2013 14:11
Show Gist options
  • Save fanyeren/6714775 to your computer and use it in GitHub Desktop.
Save fanyeren/6714775 to your computer and use it in GitHub Desktop.
is_robot.pm borrowed from fayland
sub is_robot {
my ($agent) = @_;
return 1 if is_site_robot($agent);
return 1 if is_program_robot($agent);
return;
}
sub is_site_robot {
my ($agent) = @_;
return 1 if $agent =~ /Googlebot|Baiduspider|Yahoo! Slurp|Bingbot|MSNbot|altavista|lycos|infoseek|webcrawler|lecodechecker|Ask Jeeves|facebookexternalhit|adsbot-google|ia_archive|FatBot|Xenu Link Sleuth|BlitzBOT|btbot|CatchBot|Charlotte|Discobot|FAST-WebCrawler|FurlBot|Gaisbot|iaskspider|Mediapartners-Google|Seekbot|SEOChat|SeznamBot|Sitebot|sogou spider|Sosospider|TweetedTimes|YahooSeeker|YandexBot|Yeti|YodaoBot|YoudaoBot|ZyBorg|Twitterbot|AhrefsBot|TweetedTimes Bot|TweetmemeBot|bitlybot|ShowyouBot|UnwindFetchor|MetaURI API|PaperLiBot|LinkedInBot|AddThis\.com robot|FriendFeedBot|MnoGoSearch|sistrix|MJ12bot|EZooms|UnisterBot|SiteExplorer|Exabot|Infohelfer|AcoonBot|Pixray-Seeker|emefgebot|Snipebot|Dataprovider Site Explorer|iBusiness Shopcrawler|pmoz\.info|Toplistbot|findlinks|netEstate NE Crawler|Crawler for Netopian|msnbot|webalta|suchen\.de|depspid|gigabot|3GSE bot|IRLbot|cuil\.com|Gigameme\.bot|BotOnParade|Crawly|infometrics-bot|Kaloogabot|Speedy Spider|iCcrawler|WebDataCentreBot|LinkWalker|Tagoobot|searchme\.com|Jyxobot|Purebot|Yanga WorldSearch|MSRBOT|VEDENSBOT|Fastsearch|Twiceler|Linguee Bot|ScoutJet/i;
return 1 if $agent =~ /^silk/i;
return;
}
sub is_program_robot {
my ($agent) = @_;
return 1 if $agent =~ /libwww-perl|PycURL|EventMachine HttpClient|Apache-HttpClient|ApacheBench/;
return 1 if $agent =~ m{Python-(\w+)/}i;
return 1 if $agent =~ m{^Java/};
return 1 if $agent eq 'Ruby';
return;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment