Skip to content

Instantly share code, notes, and snippets.

@vinayakvanarse
Created February 10, 2022 18:25
Show Gist options
  • Save vinayakvanarse/7b3fe680ff9337a7e7aa4ac68f36367f to your computer and use it in GitHub Desktop.
Save vinayakvanarse/7b3fe680ff9337a7e7aa4ac68f36367f to your computer and use it in GitHub Desktop.
Data Engineering Course Structure
/******************************
Big Data for Data Engineering
******************************/
Lesson 1
1.1 Introduction to Big Data 02:40
1.2 Welcome 01: 09
Lesson 2 What_is_Big_data 09:56
What is Big data 05:49
Big data in Business 04:07
Big Data and Business Analytics comes of age
Lesson 3 Beyond_the_Hype 05:57
Beyond the Hype 05:57
Facebook joins Google in HPC Computing Architectures for Big Data
Lesson 4 Big_data_and_data_Science 05:57
Big data and data science 05:57
Climate Change and Big Data_Dec 2 012
Lesson 5 Big_Data_use_Cases 05:36
Big data use cases 05:36
Big Data and Sensors_Jan 2 013
Lesson 6 Processing_Big_Data 05:55
Processing big data 05:55
Hadoop and Lustre - Some Thoughts
/******************************
HADOOP
******************************/
Lesson 2 Introduction to Hadoop 08:10
What is Hadoop Part-A 03:53
What is Hadoop Part-B 04:17
Lesson 3 Hadoop Architecture and HDFS 15:07
Hadoop Architecture Part-A 07: 03
Hadoop Architecture Part-B 04:49
HDFS CommandLine 03:15
Lesson 4 Hadoop administration 05:52
Hadoop Administration 05:52
Lesson 5 Hadoop Components 12:17
MapReduce 04:3 0
Pig and Hive 03:56
Flume, Sqoop , and Oozie 03:51
/******************************
Data Engineering with Scala
******************************/
Lesson 2 Introduction 27:36
2.1 Learning Objectives
2.2 Introduction to Scala 03:47
2.3 Getting Started with Scala 05: 02
2.4 Creating a Scala Project 06:49
2.5 The Scala REPL 05:49
2.6 Scala Documentation 06: 09
2.7 Introduction
Lesson 3 Basic Object Oriented Programming23:57
3.1 Learning Objectives
3.2 Classes 05:2 0
3.3 Immutable and Mutable Fields 05:12
3.4 Methods 05:12
3.5 Default and Named Arguments 03:39
3.6 Objects 04:34
Classes
Lesson 4 Case Objects and Classes24:11
4.1 Learning Objectives
4.2 Companion Objects 03:44
4.3 Case Classes and Case Objects 04:55
4.4 Apply and Unapply 04:43
4.5 Synthetic Methods 05:16
4.6 Immutability and Thread Safety 05:33
Case Objects and Classes
Lesson 5 Collections31:2 0
5.1 Learning Objectives
5.2 Collections Overview 05:16
5.3 Sequences and Sets 08: 09
5.4 Options 03:29
5.5 Tuples and Maps 06: 05
5.6 Higher Order Functions 08:21
Collections
Lesson 6 Idiomatic Scala25:34
6.1 Learning Objectives
6.2 For Expressions 06: 01
6.3 Pattern Matching 04:49
6.4 Handling Options 03:55
6.5 Handling Failures 05: 06
6.6 Handling Futures 05:43
Idiomatic Scala
/******************************
Big Data Hadoop and Spark Developer
******************************/
Big Data Hadoop and Spark Developer
Lesson 1 Course Introduction 08:51
1.1 Course Introduction 05:52
1.2 Accessing Practice Lab 02:59
Lesson 2 Introduction to Big Data and Hadoop43:59Preview
Lesson 3 Hadoop Architecture,Distributed Storage (HDFS) and YARN57:5 0Preview
Lesson 4 Data Ingestion into Big Data Systems and ETL 01: 04: 02Preview
Lesson 5 Distributed Processing - MapReduce Framework and Pig 01: 01: 09Preview
Lesson 6 Apache Hive57:45Preview
Lesson 7 NoSQL Databases - HBase21:41Preview
Lesson 8 Basics of Functional Programming and Scala44:59Preview
Lesson 9 Apache Spark Next Generation Big Data Framework36:54Preview
Lesson 1 0 Spark Core Processing RDD 01:16:31Preview
Lesson 11 Spark SQL - Processing DataFrames26:5 0Preview
Lesson 12 Spark MLLib - Modelling BigData with Spark32:54Preview
Lesson 13 Stream Processing Frameworks and Spark Streaming 01:13:16Preview
Lesson 14 Spark GraphX
Linux Training
Lesson 01 - Course Introduction 05:15Preview
Lesson 02 - Introduction to Linux 04:35Preview
Lesson 03 - Ubuntu16:24Preview
Lesson 04 - Ubuntu Dashboard17:53Preview
Lesson 05 - File System Organization31:22Preview
Lesson 06 - Introduction to CLI 01:15:45Preview
Lesson 07 - Editing Text Files and Search Patterns27:19Preview
Lesson 08 - Package Management
/******************************
Apache Kafka
******************************/
Section 01 - Introduction to Apache Kafka
Lesson 01 - Course Introduction 07:16
Course Introduction 07:16
Lesson 02 - Big Data Overview 03: 07Preview
Lesson 03 - Big Data Analytics 02:55Preview
Lesson 04 - Messaging System 05:48
Lesson 05 - Kafka Overview 08:33Preview
Lesson 06 - Kafka Components and Architecture 09:16
Lesson 07 - Kafka Clusters 01:27
Lesson 08 - Kafka Industry Usecases 02:27
Lesson 09 - Demo: Install Kafka and Zookeeper 04:58Preview
Lesson 1 0 - Demo: Single Node Single-Multi Broker Cluster 05:38
Lesson 11 - Key Takeaways
Section 02 - Kafka Producer
Lesson 01 - Overview of Producer and Its Architecture 04:51Preview
Lesson 02 - Kafka Producer Configuration14:33Preview
Lesson 03 - Send Messages 04:5 0
Lesson 04 - Serializers13:51Preview
Lesson 05 - Partitions 08:5 0Preview
Lesson 06 - Key Takeaways
Section 03 - Kafka Consumer
Lesson 01 - Kafka Consumer - Overview, Consumer Groups and Partitioners12:27Preview
Lesson 02 - Poll Loop 02:42
Lesson 03 - Configuring Consumer12:26Preview
Lesson 04 - Commit and Offset13:59Preview
Lesson 05 - Rebalance Listeners 01:45Preview
Lesson 06 - Consuming Records with Specific Offset 04:13
Lesson 07 - Deserializers 05:32
Lesson 08 - Key Takeaways
Section 04 - Kafka Operations and Performance Tuning
Lesson 01 - Learning Objectives 04:46Preview
Lesson 02 - Replications14:53Preview
Lesson 04 - Storage 09:59Preview
Lesson 05 - Configuration in Reliable System18:18
Lesson 05 - Key Takeaways
Section 05 - Kafka Cluster Architecture and Administering Kafka
Lesson 01 - Learning Objectives 05:22Preview
Lesson 02 - Multi Cluster Architecture 08:45Preview
Lesson 03 - MirrorMaker17:41Preview
Lesson 04 - Administering Kafka 09:5 0Preview
Lesson 05 - Dynamic Configuration Changes 09:2 0
Lesson 06 - Console Producer Tool 01:27
Lesson 07 - Console Consumer Tool 02:36
Lesson 08 - Key Takeaways
Section 06 - Kafka Monitoring and Schema Registry
Lesson 01 - Monitoring47:23Preview
Lesson 02 - Kafka Schema Registry and Avro 06:27Preview
Lesson 03 - Kafka Schema Registry Components 08:14Preview
Lesson 04 - Kafka Schema Registry Working 08:25
Lesson 05 - Key Takeaways
Section 07 - Kafka Streams and Kafka Connectors
Lesson 01 - Kafka Stream Overview 09:49Preview
Lesson 02 - Kafka Stream Architecture, Working and Components5 0:42Preview
Lesson 03 - Stream Concepts and Working15:3 0Preview
Lesson 04 - Kafka Connectors 06: 08
Lesson 05 - Kafka Connector Configuration25: 08Preview
Lesson 06 - Key Takeaways
Section 08 - Integration of Kafka with Storm
Lesson 01 - Apache Storm 09:1 0Preview
Lesson 02 - Apache Storm Architecture and Components 08:34Preview
Lesson 03 - Apache Storm Topology1 0:44Preview
Lesson 04 - Kafka Spout 03:54
Lesson 05 - Integration of Apache Storm and Kafka1 0:19
Lesson 06 - Key Takeaways
Section 09 - Kafka Integration with Spark and Flume
Lesson 01 - Introduction to Spark and It_s Components1 0:59Preview
Lesson 02 - Basics of Spark - RDD, Data Sets, and Transformation and Actions24:46Preview
Lesson 03 - Spark Stream 03: 09
Lesson 04 - Spark Integration with Kafka 06:26
Lesson 05 - Flume 08: 03Preview
Lesson 06 -Flume Kafka to HDFS Configuration13:28Preview
Lesson 07 - Key Takeaways
Section 10 - Admin Client and Securing Kafka
Lesson 01 - Admin Client11:59Preview
Lesson 02 - Kafka Security 01:36Preview
Lesson 03 - Kafka Security Components 08:58
Lesson 04 - Configure SSL in Kafka 01:5 0
Lesson 05 - Secure using ACLs 05:12
Lesson 06 - Key Takeaways
/******************************
AWS BigData
******************************/
Section 2 - Live Virtual Class Curriculum
Lesson 01 - Course Introduction
Overview of AWS Certified Data Analytics - Speciality Course
Overview of the Certification
Overview of the Course
Project highlights
Course Completion Criteria
Lesson 02 AWS in Big Data Introduction
Introduction to Cloud Computing
Cloud Computing Deployments Models
Types of Cloud Computing Services
AWS Fundamentals
AWS Cloud Economics
AWS Virtuous Cycle
AWS Cloud Architecture Design Principles
Why AWS for Big Data - Challenges
Databases in AWS
Relational vs Non Relational Databases
Data Warehousing in AWS
AWS Services for collecting, processing, storing, and analyzing big data
Key Takeaways
Deploy a Data Warehouse Using Amazon Redshift
Lesson 03 Collection
AWS Big Data Collection Services
Fundamentals of Amazon Kinesis
Loading Data into Kinesis Stream
Assisted Practice: Loading Data into Amazon Storage
Kinesis Data Stream High-Level Architecture
Kinesis Stream Core Concepts
AWS Services and Amazon Kinesis Data Stream
How to Put Data into Kinesis Stream?
Kinesis Connector Library
Amazon Kinesis Data Firehose
Assisted Practice: Transfer Data into Delivery Stream using Firehose
Assisted Practice: Transfer VPC Flow log to Splunk using Firehose
Data Transfer using AWS Lambda
Assisted Practice: Backing up data in Amazon S3 using AWS Lambda
Amazon SQS
IoT and Big Data
Amazon IoT Greengrass
AWS Data Pipeline
Components of Data Pipeline
Assisted Practice: Export MySQL Data to Amazon S3 Using AWS Data Pipeline
Key Takeaways
Streaming Data with Kinesis Data Analytics
Lesson 04 Storage
AWS Bigdata Storage services
Data lakes and Analytics
Data Management
Data Life Cycle
Fundamentals of Amazon Glacier
Glacier and Big Data
DynamoDB Introduction
DynamoDB: Core Components
Assisted Practice: Perform operations on DynamoDB table
DynamoDB in AWS Eco-System
DynamoDB Partitions
Data Distribution
DynamoDB GSI and LSI
DynamoDB Streams
Use cases: Capturing Table Activity with DynamoDB Streams
Cross-Region Replication
Assisted Practice: Create a Global Table using DynamoDB
DynamoDB Performance: Deep Dive
Partition Key Selection
Snowball & AWS BigData
Assisted Practice: Data Migration using AWS Snowball
AWS DMS
AWS Aurora in BigData
Assisted Practice: Create and Modify Aurora DB Cluster
Storing and Retrieving the Data from DynamoDB
Lesson 05 Processing I
AWS Bigdata Processing Services
Overview of Amazon Elastic MapReduce (EMR)
EMR Cluster Architecture
Apache Hadoop
Apache Hadoop Architecture
Storage Options
EMR Operations
AWS Cluster
Assisted Practice: Create a cluster in S3
Assisted Practice: Monitor a Cluster in S3
Using Hue with EMR
Assisted Practice: Launch HUE Web Interface on Amazon EMR
Setup Hue for LDAP
Assisted Practice: Configure HUE for LDAP Users
Hive on EMR
Assisted Practice: Set Up a Hive Table to Run Hive Commands
Key Takeaways
Lesson 06: Processing II
Using HBase with EMR
HBase Architecture
Assisted Practice: Create a cluster with HBase
HBase and EMRFS
Presto with EMR
Presto Architecture
Fundamentals of Apache Spark
Apache Spark Architecture
Assisted Practice: Create a cluster with Spark
Apache Spark Integration with EMR
Fundamentals of EMR File System
Amazon Simple Workflow
AWS Lambda in Big Data Ecosystem
AWS Lambda and Kinesis Stream
AWS Lambda and RedShift
HCatalog
Key Takeaways
Real-Time Application with Apache Spark and AWS EMR
Lesson 07 ETL with Redshift
Introduction to AWS Bigdata Analysis Services
Fundamentals of Amazon Redshift
Amazon RedShift Architecture
Assisted Practice: Launch a Cluster, Load Dataset, and Execute Queries
RedShift in the AWS Ecosystem
Columnar Databases
Assisted Practice: Monitor RedShift Maintenance and Operations
RedShift Table Design
Choosing the Distribution Style
Redshift Data types
RedShift Data Loading
COPY Command for Data Loading
RedShift Loading Data
Key Takeaways
Lesson 08: Analysis with Machine Learning
Fundamentals of Machine Learning
Workflow of Amazon Machine Learning
Use cases
Machine learning Algorithms
Amazon SageMaker
Machine learning with Amazon Sagemaker
Assisted Practice: Build, Train, and Deploy a Machine Learning Model
Elasticsearch
Amazon Elasticsearch Service
Zone Awareness
Logstash
RStudio
Assisted Practice: Fetch the File and Run Analysis using RStudio
Amazon Athena
Assisted Practice: Execute Interactive SQL Queries in Athena
AWS Glue
Key Takeaways
Fraud Detection Using Classification Algorithms on AWS Sagemaker
Lesson 09 Analysis and Visualization
Introduction to AWS Bigdata Visualization Services
Amazon QuickSight
Amazon QuickSight - Workflow and Use Cases
Assisted Practice: Analyze the marketing campaign
Working with data
Assisted Practice: Analyze the marketing campaign using data from Amazon S3
Assisted Practice: Analyze the marketing campaign using data from Presto
Amazon QuickSight: Visualization
Assisted Practice: Create Visuals
Amazon QuickSight: Stories
Assisted Practice: Create a Storyboard
Amazon QuickSight: Dashboard
Assisted Practice: Create a Dashboard
Data Visualization: Other Tools
Kibana
Assisted Practice: Create a Dashboard on Kibana
Key Takeaways
Exploratory Data Analysis Using AWS QuickSight
Lesson 1 0: Security
Introduction to AWS Bigdata Security
EMR Security
EMR Security: Best Practices
Roles
Fundamentals of Redshift Security
Data Protection and Encryption
Master Key, Encryption, and Decryption Process
Amazon Redshift Database Encryption
Key Management Services(KMS) Overview
Encryption using Hardware Security Modules
STS and Cross Account Access
Cloud Trail
Key Takeaways
Practice Projects
Practice Projects
Real-time Analytics on Streaming Data
Truegate S3 Replication Big Data Assignment
/******************************
Azure BigData
******************************/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment