Skip to content

Instantly share code, notes, and snippets.

@nomoa
Created September 4, 2020 12:37
Show Gist options
  • Save nomoa/d9e8c205320819cb32469fe71f360149 to your computer and use it in GitHub Desktop.
Save nomoa/d9e8c205320819cb32469fe71f360149 to your computer and use it in GitHub Desktop.
extract ntriple from rdd
def getDirectoryWriter(outputPath: String, partitions: Int)(implicit spark: SparkSession): RDD[Row] => Unit = {
rdd: RDD[Row] => {
rdd
.repartition(partitions)
.map(r => s"${r.getAs("subject")} ${r.getAs("predicate")} ${r.getAs("object")} .")
.saveAsTextFile(outputPath, classOf[BZip2Codec])
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment