jaceklaskowski/hadoop-spark-properties.md

Last active April 22, 2021 11:58

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/jaceklaskowski/6fb58f2e28bcd6c1d493573e7bb5accc.js"></script>
Save jaceklaskowski/6fb58f2e28bcd6c1d493573e7bb5accc to your computer and use it in GitHub Desktop.

Download ZIP

Hadoop Properties for Spark in Cloud (s3a, buckets)

Raw

hadoop-spark-properties.md

Hadoop Properties for Spark in Cloud

The following is a list of Hadoop properties for Spark to use HDFS more effective.

spark.hadoop.-prefixed Spark properties are used to configure a Hadoop Configuration that Spark broadcast to tasks. Use spark.sparkContext.hadoopConfiguration to review the properties.

spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version = 2

Google Cloud Storage

Read Google Cloud Storage Connector for Spark and Hadoop

com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS

Amazon S3

Read Hadoop-AWS module

fs.s3a.impl = org.apache.hadoop.fs.s3a.S3AFileSystem
fs.s3a.multiobjectdelete.enable = false
fs.s3a.fast.upload = true
fs.s3a.endpoint
fs.s3a.access.key
fs.s3a.secret.key
fs.s3a.path.style.access = true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment