Skip to content

Instantly share code, notes, and snippets.

@aolwas
Last active April 15, 2023 15:07
Show Gist options
  • Save aolwas/53f851d0ac8aa4057cf51fe5bc224d6d to your computer and use it in GitHub Desktop.
Save aolwas/53f851d0ac8aa4057cf51fe5bc224d6d to your computer and use it in GitHub Desktop.
Pyspark env setup
#!/bin/bash
# This snippet install pyspark and necessary jar dependendies inside a active python environment.
#
# Using conda:
# [Optional] install mambaforge (https://github.com/conda-forge/miniforge)"; exit 1;}
# [Optional] mamba create -n MY_ENV python=3.10 pyspark=3.3.2
# conda activate MY_ENV
#
# Using venv
# [Optional] python -m venv venv
# source ./venv/bin/activate
# [Optional] pip install pyspark==3.3.2
SPARK_EXTRAS_PREFIX=$(python -c 'import site; print(site.getsitepackages().pop())')/pyspark/jars/
HADOOP_AWS_VERSION=3.3.1
JODA_TIME_VERSION=2.12.0
AWS_SDK_VERSION=1.11.999
DELTA_IO_VERSION=2.3.0
DELTA_SCALA_VERSION=2.12
wget -P ${SPARK_EXTRAS_PREFIX} https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/${HADOOP_AWS_VERSION}/hadoop-aws-${HADOOP_AWS_VERSION}.jar
wget -P ${SPARK_EXTRAS_PREFIX} https://repo1.maven.org/maven2/joda-time/joda-time/${JODA_TIME_VERSION}/joda-time-${JODA_TIME_VERSION}.jar
wget -P ${SPARK_EXTRAS_PREFIX} https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/${AWS_SDK_VERSION}/aws-java-sdk-${AWS_SDK_VERSION}.jar
wget -P ${SPARK_EXTRAS_PREFIX} https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-core/${AWS_SDK_VERSION}/aws-java-sdk-core-${AWS_SDK_VERSION}.jar
wget -P ${SPARK_EXTRAS_PREFIX} https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/${AWS_SDK_VERSION}/aws-java-sdk-s3-${AWS_SDK_VERSION}.jar
wget -P ${SPARK_EXTRAS_PREFIX} https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-kms/${AWS_SDK_VERSION}/aws-java-sdk-kms-${AWS_SDK_VERSION}.jar
wget -P ${SPARK_EXTRAS_PREFIX} https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-dynamodb/${AWS_SDK_VERSION}/aws-java-sdk-dynamodb-${AWS_SDK_VERSION}.jar
wget -P ${SPARK_EXTRAS_PREFIX} https://repo1.maven.org/maven2/io/delta/delta-core_${DELTA_SCALA_VERSION}/${DELTA_IO_VERSION}/delta-core_${DELTA_SCALA_VERSION}-${DELTA_IO_VERSION}.jar
wget -P ${SPARK_EXTRAS_PREFIX} https://repo1.maven.org/maven2/io/delta/delta-storage/${DELTA_IO_VERSION}/delta-storage-${DELTA_IO_VERSION}.jar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment