Skip to content

Instantly share code, notes, and snippets.

@huasanyelao
huasanyelao / gist:91a4e83e6fc290e4e384c44fb568e104
Created December 15, 2017 07:17 — forked from cvogt/gist:9193220
Slick: Dynamic query conditions using the **MaybeFilter** (Updated to support nullable columns)
import scala.slick.lifted.CanBeQueryCondition
// optionally filter on a column with a supplied predicate
case class MaybeFilter[X, Y](val query: scala.slick.lifted.Query[X, Y]) {
def filter[T,R:CanBeQueryCondition](data: Option[T])(f: T => X => R) = {
data.map(v => MaybeFilter(query.filter(f(v)))).getOrElse(this)
}
}
// example use case
import java.sql.Date
import numpy as np
from matplotlib import pylab as plt
#from mpltools import style # uncomment for prettier plots
#style.use(['ggplot'])
'''
function definitions
'''
# generate all bernoulli rewards ahead of time
def generate_bernoulli_bandit_data(num_samples,K):
@huasanyelao
huasanyelao / zook_grow.md
Created January 21, 2016 08:25 — forked from miketheman/zook_grow.md
Adding nodes to a ZooKeeper ensemble

Adding 2 nodes to an existing 3-node ZooKeeper ensemble without losing the Quorum

Since many deployments may start out with 3 nodes and so little is known about how to grow a cluster from 3 memebrs to 5 members without losing the existing Quorum, here is an example of how this might be achieved.

In this example, all 5 nodes will be running on the same Vagrant host for the purpose of illustration, running on distinct configurations (ports and data directories) without the actual load of clients.

YMMV. Caveat usufructuarius.

Step 1: Have a healthy 3-node ensemble

@huasanyelao
huasanyelao / Demo2.R
Created December 31, 2015 09:21 — forked from zachmayer/Demo2.R
#Setup
rm(list = ls(all = TRUE))
gc(reset=TRUE)
set.seed(1234) #From random.org
#Libraries
library(caret)
library(devtools)
install_github('caretEnsemble', 'zachmayer') #Install zach's caretEnsemble package
library(caretEnsemble)
@huasanyelao
huasanyelao / rank_metrics.py
Created December 17, 2015 12:53 — forked from bwhite/rank_metrics.py
Ranking Metrics
"""Information Retrieval metrics
Useful Resources:
http://www.cs.utexas.edu/~mooney/ir-course/slides/Evaluation.ppt
http://www.nii.ac.jp/TechReports/05-014E.pdf
http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
http://hal.archives-ouvertes.fr/docs/00/72/67/60/PDF/07-busa-fekete.pdf
Learning to Rank for Information Retrieval (Tie-Yan Liu)
"""
import numpy as np
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=com.uebercomputing.mailrecord.MailRecordRegistrator
spark.kryoserializer.buffer.mb=128
spark.kryoserializer.buffer.max.mb=512
@huasanyelao
huasanyelao / spark_gzip.py
Created December 11, 2015 09:29 — forked from msukmanowsky/spark_gzip.py
Example of how to save Spark RDDs to disk using GZip compression in response to https://twitter.com/rjurney/status/533061960128929793.
from pyspark import SparkContext
def main():
sc = SparkContext(appName="Test Compression")
# RDD has to be key, value pairs
data = sc.parallelize([
("key1", "value1"),
("key2", "value2"),
("key3", "value3"),
package org.apache.spark.examples
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
import java.util.Random
import scala.collection.mutable
import org.apache.spark.serializer.KryoRegistrator
import com.esotericsoftware.kryo.Kryo
@huasanyelao
huasanyelao / schema.xml
Last active August 29, 2015 14:25 — forked from leoh/schema.xml
config for uuid field in solr 4.6
<?xml version="1.0" encoding="UTF-8" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
package com.github.juanrh.spark_kafka
import org.apache.spark.SparkConf
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kafka.KafkaUtils
import kafka.message.MessageAndMetadata
import kafka.serializer.StringDecoder