Skip to content

Instantly share code, notes, and snippets.

@sllynn
Created April 27, 2022 17:50
Show Gist options
  • Save sllynn/aa4a5886dd47950d4570c1344795fcb4 to your computer and use it in GitHub Desktop.
Save sllynn/aa4a5886dd47950d4570c1344795fcb4 to your computer and use it in GitHub Desktop.
Unzip a lot of zip files from DBFS
import org.apache.spark.sql.functions._
import spark.implicits._
import sys.process._
val paths = dbutils.fs.ls("/FileStore/shared_uploads/[email protected]/shapefiles/").toDF
.select("path", "name")
.where(col("path").endsWith(".zip"))
.withColumn("path", regexp_replace($"path", "dbfs:/", "/dbfs/"))
.withColumn("root", regexp_replace($"path", $"name", lit("")))
.drop("name")
.foreach(r => {
s"unzip -o ${r.getString(0)} -d ${r.getString(1)}".!
()
})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment