Created
October 9, 2018 11:01
-
-
Save RajaShyam/85d3a7ae77ec4483529550e048b06b95 to your computer and use it in GitHub Desktop.
Spark Measure
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Methodologies - From Cern | |
Spark Measure github link - https://github.com/LucaCanali/sparkMeasure | |
- Can be used for measuring metrics of spark job | |
- Can be started as easily by specifying in packages - bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.11:0.13 | |
Measuring spark: | |
1. WebUI | |
2. Execution Plans and DAG's | |
3. WebUI event timeline - see what each task is doing | |
4. REST API Spark metrics - History server URL + /api/v1/applications | |
5. Event log - Stores WebUI history | |
Config: spark.eventLog.enabled=true | |
spark.eventLog.dir= <path> | |
- Stored as Json and it contains the details/info displayed by Spark history server | |
- Help link - https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_EventLog.md | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment