Skip to content

Instantly share code, notes, and snippets.

@yarax
Created August 22, 2022 16:02
Show Gist options
  • Save yarax/b73fada9315a195c818367bd234649e5 to your computer and use it in GitHub Desktop.
Save yarax/b73fada9315a195c818367bd234649e5 to your computer and use it in GitHub Desktop.
PySparkDFBasics
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder.master("local[1]").appName("AWSpark").getOrCreate()
df = spark.read.option("header","true").csv("s3a://aws-stocks-dataset/AAL.csv")
df.show()
df = df.withColumn("date_dt", to_date(col("Date"),"dd-MM-yyyy"))
df = df.withColumn("high_cents", col("High") * 100)
df = df.orderBy(col("High").desc()).limit(10)
df = df.withColumnRenamed("Adjusted Close", "Adjusted_close")
df.write.mode("overwrite").parquet("stocks.parquet")
df = spark.read.parquet("stocks.parquet")
df.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment