Skip to content

Instantly share code, notes, and snippets.

@RajaShyam
Created October 9, 2018 11:01
Show Gist options
  • Save RajaShyam/85d3a7ae77ec4483529550e048b06b95 to your computer and use it in GitHub Desktop.
Save RajaShyam/85d3a7ae77ec4483529550e048b06b95 to your computer and use it in GitHub Desktop.
Spark Measure
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Methodologies - From Cern
Spark Measure github link - https://github.com/LucaCanali/sparkMeasure
- Can be used for measuring metrics of spark job
- Can be started as easily by specifying in packages - bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.11:0.13
Measuring spark:
1. WebUI
2. Execution Plans and DAG's
3. WebUI event timeline - see what each task is doing
4. REST API Spark metrics - History server URL + /api/v1/applications
5. Event log - Stores WebUI history
Config: spark.eventLog.enabled=true
spark.eventLog.dir= <path>
- Stored as Json and it contains the details/info displayed by Spark history server
- Help link - https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_EventLog.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment