Skip to content

Instantly share code, notes, and snippets.

@pogpog
Last active January 7, 2022 19:14
Show Gist options
  • Save pogpog/c0832074819d43dbb774bb041c56357d to your computer and use it in GitHub Desktop.
Save pogpog/c0832074819d43dbb774bb041c56357d to your computer and use it in GitHub Desktop.
Process a WordPress log file (debug.log) into a tally of how frequently each entry appears
import re
from pandas import DataFrame
# Replace with your file names
read_file = 'debug.log'
write_file = 'output.csv'
# Utility vars
count = []
message = []
percent= []
total = 0
# Populate tally of log entries
with open(read_file) as f1:
d = {} # Dictionary for tally
lines = f1.readlines()
for line in lines:
total += 1
# Remove date from start of line
line = re.sub(r"\[[^\]]+\]\s?(.*)", r"\1", line)
# Remove trailing \n
line = line.rstrip("\n")
if line in d:
d[line] += 1
else:
d[line] = 1
# Populate lists for DataFrame columns
for key in d:
message.append(key)
count.append(d[key])
percent.append(round(d[key]/total, 5))
# Create DataFrame with headings and data
df = DataFrame({'count': count, 'percent': percent, 'message': message})
df.sort_values(['count'], inplace=True, ascending=False)
df.to_csv(write_file, index=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment