Skip to content

Instantly share code, notes, and snippets.

@moorbrook
Created July 26, 2017 00:45
Show Gist options
  • Save moorbrook/6bf724293aea559932ea5a8ddf2283a3 to your computer and use it in GitHub Desktop.
Save moorbrook/6bf724293aea559932ea5a8ddf2283a3 to your computer and use it in GitHub Desktop.
Find unique referrers in apache/nginx access log
# regexps found here http://www.seehuhn.de/blog/52
import re
parts = [
r'(?P<host>\S+)', # host %h
r'\S+', # indent %l (unused)
r'(?P<user>\S+)', # user %u
r'\[(?P<time>.+)\]', # time %t
r'"(?P<request>.+)"', # request "%r"
r'(?P<status>[0-9]+)', # status %>s
r'(?P<size>\S+)', # size %b (careful, can be '-')
r'"(?P<referer>.*)"', # referer "%{Referer}i"
r'"(?P<agent>.*)"', # user agent "%{User-agent}i"
]
pattern = re.compile(r'\s+'.join(parts)+r'\s*\Z')
# example from blog
# line = ... a line from the log file ...
# m = pattern.match(line)
# res = m.groupdict()
#-------------------------------
# python one-liner generator expression
refs = sorted(set(pattern.match(line).groupdict()['referer'] for line in open('/var/log/nginx/access.log')))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment