Skip to content

Instantly share code, notes, and snippets.

@saiwaiyanyu
saiwaiyanyu / 词性标记.md
Created January 8, 2016 06:22 — forked from luw2007/词性标记.md
词性标记: 包含 ICTPOS3.0词性标记集、ICTCLAS 汉语词性标注集、jieba 字典中出现的词性、simhash 中可以忽略的部分词性

词的分类

  • 实词:名词、动词、形容词、状态词、区别词、数词、量词、代词
  • 虚词:副词、介词、连词、助词、拟声词、叹词。

ICTPOS3.0词性标记集

n 名词

nr 人名

@saiwaiyanyu
saiwaiyanyu / spark_parallel_boost.py
Created December 10, 2015 08:55 — forked from wpm/spark_parallel_boost.py
A simple example of how to integrate the Spark parallel computing framework and the scikit-learn machine learning toolkit. This script randomly generates test and train data sets, trains an ensemble of decision trees using boosting, and applies the ensemble to the test set. The ensemble training is done in parallel.
from pyspark import SparkContext
import numpy as np
from sklearn.cross_validation import train_test_split, Bootstrap
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
def run(sc):
@saiwaiyanyu
saiwaiyanyu / cntrade.r
Created November 20, 2015 08:14 — forked from yanping/cntrade.r
利用网易股票数据接口下载上市公司股票价格数据
# cntrade R语言版
# 作者:陈堰平(新华指数有限责任公司,[email protected]
# 使用网易股票数据接口 原stata版的作者为:
# 李春涛(中南财经政法大学,[email protected]
# 张璇(中南财经政法大学,[email protected]
# example:
# cntrade(c('600000', '000008'), path ='D:/stockprice', start = '20010104', end = '20120124')
cntrade <- function(tickers, path = "", start = "19910101", end = "") {
@saiwaiyanyu
saiwaiyanyu / server.r
Created November 19, 2015 09:39 — forked from wch/app.r
Shiny example app with dynamic number of plots
max_plots <- 5
shinyServer(function(input, output) {
# Insert the right number of plot output objects into the web page
output$plots <- renderUI({
plot_output_list <- lapply(1:input$n, function(i) {
plotname <- paste("plot", i, sep="")
plotOutput(plotname, height = 280, width = 250)
})