Skip to content

Instantly share code, notes, and snippets.

@chrisdubois
chrisdubois / submission.py
Created March 23, 2015 17:19
Starter code for Otto Group Product Classification
import graphlab as gl
import math
import random
train = gl.SFrame.read_csv('data/train.csv')
test = gl.SFrame.read_csv('data/test.csv')
del train['id']
def make_submission(m, test, filename):
preds = m.predict_topk(test, output_type='probability', k=9)
@cdimascio
cdimascio / Play_WS_Standalone_HowTo.md
Last active December 6, 2017 17:39
Example: Using Play Framework's WS library standalone

This gist provides a simple example of how to use Play's WS library in a standalone application (external to Play).

build.sbt - SBT build file that includes the WS library WSStandaloneTest.scala - A simple example that utilizes WS to invoke an HTTP GET request

@chaotic3quilibrium
chaotic3quilibrium / DIY Scala Enumeration - README.txt
Last active September 13, 2020 22:20
DIY Scala Enumeration (closest possible Java Enum equivalent with guaranteed pattern matching exhaustiveness checking)
README.txt - DIY Scala Enumeration
Copyright (C) 2014-2016 Jim O'Flaherty
Overview:
Provide in Scala the closest equivalent to Java Enum
- includes decorating each declared Enum member with extended information
- guarantees pattern matching exhaustiveness checking
- this is not available with scala.Enumeration
ScalaOlio library (GPLv3) which contains more up-to-date versions of both `org.scalaolio.util.Enumeration` and `org.scalaolio.util.EnumerationDecorated`:
@prb
prb / maven_spark_magic.xml
Created May 12, 2014 16:47
Fragment of a pom.xml file for packaging separate worker and driver JARs for Spark.
<!-- Fragment of pom.xml -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.2</version>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
import numpy as np
from matplotlib import pylab as plt
#from mpltools import style # uncomment for prettier plots
#style.use(['ggplot'])
'''
function definitions
'''
# generate all bernoulli rewards ahead of time
def generate_bernoulli_bandit_data(num_samples,K):
@devoncrouse
devoncrouse / MessageConsumer.java
Created July 25, 2013 16:52
Simple class to consume from a Kafka topic
import java.util.List;
import java.nio.ByteBuffer;
import java.io.IOException;
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
@luw2007
luw2007 / 词性标记.md
Last active December 30, 2024 12:48
词性标记: 包含 ICTPOS3.0词性标记集、ICTCLAS 汉语词性标注集、jieba 字典中出现的词性、simhash 中可以忽略的部分词性

词的分类

  • 实词:名词、动词、形容词、状态词、区别词、数词、量词、代词
  • 虚词:副词、介词、连词、助词、拟声词、叹词。

ICTPOS3.0词性标记集

n 名词

nr 人名

@waleking
waleking / SparkGibbsLDA.scala
Last active January 31, 2020 11:15
We implement gibbs sampling for LDA by Spark. This version performs much better than alpha version, and now can handle 3196204 words, 100 topics, 1000 sample iterations on server in 161.7 minutes. To solve the long time consuming in collect() process in alpha version, we utilize the cache() method as line 261 and line 262. We also solve a pile o…
package topic
import spark.broadcast._
import spark.SparkContext
import spark.SparkContext._
import spark.RDD
import spark.storage.StorageLevel
import scala.util.Random
import scala.math.{ sqrt, log, pow, abs, exp, min, max }
import scala.collection.mutable.HashMap
@stelcheck
stelcheck / hbase.rest.scanner.filters.md
Created October 30, 2012 10:00
HBase Stargate REST API Scanner Filter Examples

Stargate Scanner Filter Examples

Introduction

So yeah... no documentation for the HBase REST API in regards to what should a filter look like...

So I installed Eclipse, got the library, and took some time to find some of the (seemingly) most useful filters you could use. I'm very green at anything regarding HBase, and I hope this will help anyone trying to get started with it.

What I discovered is that basically, attributes of the filter object follow the same naming than in the documentation. For this reason, I have made the link clickable and direct them to the HBase Class documentation attached to it; check for the instantiation argument names, and you will have your attribute list (more or less).

@ambroff
ambroff / bench.py
Created June 30, 2012 06:32
comparing JSON and Thrift serialization speed / data size
import jsonlib
import random
import timeit
import lz4
from thrift.protocol.TBinaryProtocol import TBinaryProtocol
from thrift.protocol.TCompactProtocol import TCompactProtocol
from thrift.transport.TTransport import TMemoryBuffer