Created
May 16, 2012 02:13
-
-
Save zev/2706753 to your computer and use it in GitHub Desktop.
Custom Cascalog tap from a db query
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn convert-rs | |
[rs keys] | |
(vec (doall | |
(map #(vec (map-values % keys)) rs)))) | |
(defn map-values | |
[r keys] | |
(map #(% r) keys)) | |
(defn users-query | |
[] | |
(sql/with-connection db | |
(sql/with-query-results rs ["select id,username from users"] | |
; rs will be a sequence of maps | |
; one for each record in the result set. | |
(convert-rs rs [:id :username])))) | |
;; Need to assign the above functions to an evaluated symbol for cascalog to let us use it as a source | |
;; Cascading has some extra libs for proper db sources, which might help speed up Cascalog runs | |
(def users (users-query)) | |
(?<- (stdout) [?username ?cnt] | |
(users ?user_id "foo") | |
(count ?cnt)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What are the problems with using this type of source versus https://github.com/cwensel/cascading.jdbc/ or https://github.com/cascading/cascading-dbmigrate?
Is that this query will be run across all mappers while the others will be run once and put into hfs for the mappers to take in chunks?