Skip to content

Instantly share code, notes, and snippets.

@wilmeragsgh
Last active May 8, 2020 21:24
Show Gist options
  • Save wilmeragsgh/01ead286358df670caba3cf924829550 to your computer and use it in GitHub Desktop.
Save wilmeragsgh/01ead286358df670caba3cf924829550 to your computer and use it in GitHub Desktop.
Bash script for setting up spark standalone (ubuntu)
apt-get install openjdk-8-jdk-headless
SPARK_RELEASE=spark-3.0.0-preview2 # Update as required from here https://spark.apache.org/downloads.html
HADOOP_VERSION=hadoop2.7 # Update as required
SPARK_DOWNLOADER_FILENAME=$SPARK_RELEASE-bin-$HADOOP_VERSION
wget -q https://www.apache.org/dyn/closer.lua/spark/$SPARK_RELEASE/$SPARK_DOWNLOADER_FILENAME.tgz
tar -xzf $SPARK_DOWNLOADER_FILENAME.tgz
mv $SPARK_DOWNLOADER_FILENAME /opt/$SPARK_RELEASE
ln -s /opt/$SPARK_RELEASE /opt/spark
export SPARK_HOME=/opt/spark
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$SPARK_HOME/bin:$PATH
# For installing Pyspark # Make sure you are in the correct environment you want to use Spark
pip install pyspark
pip install -q findspark
# See https://gist.github.com/wilmeragsgh/b2277a8ec41d833e319c513f2c2aa064 for how to test the installation in python
@wilmeragsgh
Copy link
Author

Some references:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment