Skip to content

Instantly share code, notes, and snippets.

@justone
Last active September 4, 2015 13:12
Show Gist options
  • Save justone/a76c7bfbd8ffea8f08f3 to your computer and use it in GitHub Desktop.
Save justone/a76c7bfbd8ffea8f08f3 to your computer and use it in GitHub Desktop.
Using planck to dedupe a weechat log file.
#!/usr/bin/env planck
(ns dedupe.core
(:require [planck.io :refer [read-line]]
[clojure.string :refer [split]]))
(defn line-seq
"Create a lazy sequence of stdin, from http://blog.fikesfarm.com/posts/2015-08-01-planck-scripting.html."
[]
(take-while identity
(repeatedly read-line)))
(defn third
"Just like built-in first and second, but for the lonely third element"
[c]
(nth c 2))
(defn dedupe-input!
"De-dupe lines on std in based on similarity of the second and third element in each line (tab delimited) lines are weechat log lines, formatted like this:
2015-08-26 12:30:38 nate something witty
sometimes the lines come through duplicated with a one second apart timestamp, like this:
2015-08-26 12:30:38 nate something witty
2015-08-26 12:30:39 nate something witty
this loop dedupes these, only keeping the first one"
[]
(loop [lines (line-seq)
prev-user ""
prev-words ""]
(when-let [line (first lines)]
(let [parts (split line #"\t")
user (second parts)
words (third parts)]
(if (or (not (= user prev-user)) (not (= words prev-words)))
(println line))
(recur (rest lines) user words)))))
(dedupe-input!)
$ cat log
2015-08-24 17:11:58 nate hello
2015-08-24 17:11:59 nate hello
2015-08-26 12:30:13 bob hi there
2015-08-26 12:30:37 nate funny weather we've been having.
2015-08-26 12:30:38 nate funny weather we've been having.
$ cat log | ./dedupe.cljs
2015-08-24 17:11:58 nate hello
2015-08-26 12:30:13 bob hi there
2015-08-26 12:30:37 nate funny weather we've been having.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment