Last active
December 20, 2022 01:24
-
-
Save tecmaverick/1e23a1130c8c326ccbf641d402f28f11 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
//Set local checkpoint directory. This is unreliable incase of driver restarts | |
sc.setCheckpointDir("file:///tmp/sparkcheckpoints") | |
//View the checkpoint dir | |
sc.getCheckpointDir.get | |
val rdd = sc.parallelize(Seq.range(0,100)) | |
val filteredRdd = rdd.filter(x=> x>50).map(x=> x * 2) | |
// View the lineage | |
filteredRdd.toDebugString | |
// The lineage BEFORE checkpointing | |
// (8) MapPartitionsRDD[9] at map at <console>:23 [] | |
// | MapPartitionsRDD[8] at filter at <console>:23 [] | |
// | ParallelCollectionRDD[5] at parallelize at <console>:23 [] | |
//Checkpoint the RDD | |
filteredRdd.checkpoint | |
// The lineage AFTER checkpointing | |
// (8) MapPartitionsRDD[9] at map at <console>:23 [] | |
// | ReliableCheckpointRDD[10] at count at <console>:24 [] | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment