Extreme Computing Exam Review (December 2018) Papers Topic Article PDF Type 1 Map Reduce MapReduce: Simplified Data Processing on Large Clusters Required 2 Pig Pig Latin: A Not-So-Foreign Language for Data Processing Required Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience Recommended 3 Google File System The Google File System Required 4 BigTable Bigtable: A Distributed Storage System for Structured Data Required Spanner: Google’s Globally-Distributed Database Recommended 5 Zookeeper ZooKeeper: Wait-free coordination for Internet-scale systems Required The Chubby lock service for loosely-coupled distributed systems Recommended Zab: High-performance broadcast for primary-backup systems Recommended Wait-Free Synchronization Recommended 6 Pregel Pregel: A System for Large-Scale Graph Processing Required GraphX: Graph Processing in a Distributed Dataflow Framework Recommended PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs Recommended 7 Virtualization + Containers Xen and the Art of Virtualization Recommended An Updated Performance Comparison of Virtual Machines and Linux Containers Recommended kvm: the Linux Virtual Machine Monitor Recommended Docker ecosystem – Vulnerability Analysis Recommended Notes: Topics sourced from the review lecture; papers sourced from slides and resource list Recommended papers about Spark omitted since it's not examinable (I think)
Also of interest is this paper on lease management, as suggested by Pramod on Piazza