Last active
September 4, 2015 13:12
-
-
Save justone/a76c7bfbd8ffea8f08f3 to your computer and use it in GitHub Desktop.
Using planck to dedupe a weechat log file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env planck | |
(ns dedupe.core | |
(:require [planck.io :refer [read-line]] | |
[clojure.string :refer [split]])) | |
(defn line-seq | |
"Create a lazy sequence of stdin, from http://blog.fikesfarm.com/posts/2015-08-01-planck-scripting.html." | |
[] | |
(take-while identity | |
(repeatedly read-line))) | |
(defn third | |
"Just like built-in first and second, but for the lonely third element" | |
[c] | |
(nth c 2)) | |
(defn dedupe-input! | |
"De-dupe lines on std in based on similarity of the second and third element in each line (tab delimited) lines are weechat log lines, formatted like this: | |
2015-08-26 12:30:38 nate something witty | |
sometimes the lines come through duplicated with a one second apart timestamp, like this: | |
2015-08-26 12:30:38 nate something witty | |
2015-08-26 12:30:39 nate something witty | |
this loop dedupes these, only keeping the first one" | |
[] | |
(loop [lines (line-seq) | |
prev-user "" | |
prev-words ""] | |
(when-let [line (first lines)] | |
(let [parts (split line #"\t") | |
user (second parts) | |
words (third parts)] | |
(if (or (not (= user prev-user)) (not (= words prev-words))) | |
(println line)) | |
(recur (rest lines) user words))))) | |
(dedupe-input!) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ cat log | |
2015-08-24 17:11:58 nate hello | |
2015-08-24 17:11:59 nate hello | |
2015-08-26 12:30:13 bob hi there | |
2015-08-26 12:30:37 nate funny weather we've been having. | |
2015-08-26 12:30:38 nate funny weather we've been having. | |
$ cat log | ./dedupe.cljs | |
2015-08-24 17:11:58 nate hello | |
2015-08-26 12:30:13 bob hi there | |
2015-08-26 12:30:37 nate funny weather we've been having. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment