This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| sudo apt install python3 | |
| sudo apt install python3-pip | |
| sudo add-apt-repository "deb [arch=amd64] https://packages.microsoft.com/repos/vscode stable main" | |
| sudo apt update | |
| sudo apt install code | |
| sudo apt-get install -y cmake libfreetype6-dev libfontconfig1-dev xclip |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Do this on every node of the cluster | |
| curl -O http://www.congiu.net/hive-json-serde/1.3.8/hdp23/json-serde-1.3.8-jar-with-dependencies.jar | |
| sudo cp json-serde-1.3.8-jar-with-dependencies.jar /usr/lib/presto/plugin/hive-hadoop2/ | |
| sudo chown presto:presto /usr/lib/presto/plugin/hive-hadoop2/json-serde-1.3.8-jar-with-dependencies.jar | |
| #restart presto | |
| sudo restart presto-server |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/python | |
| from tweepy import Stream, OAuthHandler | |
| from tweepy.streaming import StreamListener | |
| from progressbar import ProgressBar, Percentage, Bar | |
| import json | |
| import sys | |
| #Twitter app information | |
| consumer_secret='Your consumer secret' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| function csv2db() { | |
| echo -e ".mode csv \n.import $1.csv $1" | sqlite3 $1.db && \ | |
| sqlite3 -header -column $1.db | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| val docs = sc.textFile("/opt/dataset/don-quijote.txt.gz") | |
| val lower = docs.map(line => line.toLowerCase) | |
| val words = lower.flatMap(line => line.split("\\s+")) | |
| val counts = words.map(word => (word, 1)) | |
| val freq = counts.reduceByKey(_ + _) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| yarn logs -application_id <application_id> | |
| e.g. | |
| yarn logs -application_id application_1424284032717_0066 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Set everything to be logged to the console | |
| log4j.rootCategory=WARN, console | |
| log4j.appender.console=org.apache.log4j.ConsoleAppender | |
| log4j.appender.console.target=System.err | |
| log4j.appender.console.layout=org.apache.log4j.PatternLayout | |
| log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n | |
| # Settings to quiet third party logs that are too verbose | |
| log4j.logger.org.eclipse.jetty=WARN | |
| log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=WARN |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import com.google.gson.Gson | |
| import org.apache.spark.streaming.twitter.TwitterUtils | |
| import org.apache.spark.streaming._ | |
| import org.apache.spark.streaming.twitter._ | |
| import org.apache.spark.storage.StorageLevel | |
| import scala.io.Source | |
| import scala.collection.mutable.HashMap | |
| import java.io.File | |
| import org.apache.log4j.Logger | |
| import org.apache.log4j.Level |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env bash | |
| USER_NAME=hbd | |
| USER_HOME="/home/$USER_NAME" | |
| cd $USER_HOME | |
| mkdir $USER_HOME/twitter4j | |
| cd $USER_HOME/twitter4j | |
| # Get the Spark Streaming JAR. | |
| curl -O "http://central.maven.org/maven2/org/apache/spark/spark-streaming-twitter_2.10/1.5.0/spark-streaming-twitter_2.10-1.5.0.jar" |