- PostgreSQL
- DBT
- Prefect
- Snowflake
- Automation and Orchestration
- MLops
- Other topics
This curriculum assumes you're reasonably proficient in a popular language like: python, go, typescript. And, have at least a working knowledge of *nix, shell, git and docker..
- Basic SQL Syntax:
SELECT
,FROM
,WHERE
- Data Types (Integer, Text, Boolean, Date/Time)
- Creating Tables (
CREATE TABLE
) - Inserting Data (
INSERT INTO
) - Updating Data (
UPDATE
) - Deleting Data (
DELETE FROM
) - Simple Filtering (
=
,>
,<
,LIKE
,IN
) - Ordering Results (
ORDER BY
) - Limiting Results (
LIMIT
) - Basic Joins (
INNER JOIN
)
- More Join Types (
LEFT JOIN
,RIGHT JOIN
,FULL OUTER JOIN
) - Aggregate Functions (
COUNT
,SUM
,AVG
,MIN
,MAX
) - Grouping Data (
GROUP BY
) - Filtering Groups (
HAVING
) - Subqueries (in
SELECT
,FROM
,WHERE
) - Basic Indexing (
CREATE INDEX
) - Understanding Primary Keys and Foreign Keys
- Constraints (
NOT NULL
,UNIQUE
,CHECK
) - Views (
CREATE VIEW
) - Basic Data Import/Export (
COPY
)
- Window Functions (
ROW_NUMBER
,RANK
,LAG
,LEAD
, aggregate window functions) - Common Table Expressions (CTEs) (
WITH
) - Stored Procedures and Functions (
CREATE FUNCTION
,CREATE PROCEDURE
, basic PL/pgSQL) - Triggers (
CREATE TRIGGER
) - Transactions (ACID properties,
BEGIN
,COMMIT
,ROLLBACK
) - Understanding
EXPLAIN
and basic query optimization - More complex data types (JSON/JSONB, Arrays)
- User Defined Types (
CREATE TYPE
)
- Advanced Indexing (GIN, GiST, BRIN, Covering Indexes)
- Table Partitioning (Declarative Partitioning)
- Replication (Streaming Replication, Logical Replication)
- Connection Pooling (e.g., PgBouncer)
- Security Management (Roles, Permissions, Row-Level Security)
- Backup and Recovery Strategies (pg_dump, Point-in-Time Recovery)
- Advanced Performance Tuning and Monitoring
- Concurrency Control (MVCC, Locking)
- Foreign Data Wrappers (
CREATE FOREIGN TABLE
) - Advanced PL/pgSQL programming
- What is DBT? (Overview and Use Cases)
- Installing DBT and Setting Up a Project
- DBT Project Structure and Files
- Writing Basic Models (
.sql
files) - Running and Building Models (
dbt run
) - Using Seeds (
dbt seed
) - Simple Jinja Usage in SQL
- Sources and Refactoring with
ref()
andsource()
- Using Variables and Macros
- Testing Data with Built-in Tests (
unique
,not_null
, etc.) - Documentation (
dbt docs generate
,dbt docs serve
) - Using Snapshots
- Incremental Models
- Configuring Model Materializations
- Writing Custom Tests and Macros
- Advanced Jinja and Control Structures
- Using Hooks and Operations
- Advanced Model Configurations (tags, ephemeral, etc.)
- Source Freshness and Auditing
- Deployment Best Practices
- Debugging and Logging
- DBT in Production (CI/CD, Scheduling)
- DBT Cloud vs DBT Core
- Managing Large Projects (Packages, Modularization)
- Advanced Performance Optimization
- Integrating DBT with Data Orchestration Tools (Airflow, Prefect)
- Writing and Publishing DBT Packages
- Security and Access Control in DBT
- What is Prefect? (Overview and Use Cases)
- Installing Prefect and Basic Setup
- Writing Your First Flow
- Tasks and Flows Basics
- Running Flows Locally
- Using Prefect CLI
- Parameters and Context in Flows
- Scheduling Flows
- Using State Handlers
- Mapping and Dynamic Task Generation
- Logging and Monitoring Basics
- Using Blocks and Storage Options
- Working with Prefect Cloud UI
- Deployments and Infrastructure Blocks
- Advanced Error Handling and Retries
- Using Collections and Integrations (e.g., S3, GCS, Databases)
- Orchestrating Flows with Subflows
- Using Secrets and Environment Variables
- Custom Task and Flow Classes
- Prefect Agents and Work Queues
- Scaling and High Availability
- Custom Infrastructure and Execution Environments
- Advanced Monitoring and Alerting
- CI/CD Integration for Prefect Deployments
- Security Best Practices and Access Control
- Extending Prefect with Plugins and Custom Collections
- What is Snowflake? (Overview and Architecture)
- Setting Up a Snowflake Account and UI Tour
- Understanding Warehouses, Databases, and Schemas
- Creating and Querying Tables
- Basic SQL in Snowflake (
SELECT
,INSERT
,UPDATE
,DELETE
) - Loading Data with Web UI and Worksheets
- Using Snowflake Stages (Internal/External)
- Bulk Loading Data (
COPY INTO
) - Working with File Formats
- Time Travel and Data Retention
- Cloning Databases, Schemas, and Tables
- Working with Views and Secure Views
- Using Snowflake Functions and Sequences
- Query Performance Basics
- Streams and Tasks (Change Data Capture, Automation)
- Materialized Views
- Semi-structured Data (VARIANT, JSON, XML, PARSE/FLATTEN)
- User-defined Functions (UDFs) and Procedures
- Data Sharing and Data Marketplace
- Resource Monitors and Usage Tracking
- Query Profiling and Optimization
- Snowflake Security (Roles, Policies, Masking, Row Access)
- Data Governance and Compliance Features
- Advanced Performance Tuning (Clustering, Result Caching)
- Snowpipe (Continuous Data Ingestion)
- External Tables and Data Lake Integration
- Working with Snowpark (Python, Java, Scala)
- Automation and Orchestration with Third-party Tools
- Multi-cloud and Cross-region Features
- What is Automation and Orchestration? (Overview and Use Cases)
- Introduction to Scheduling (Cron, Task Schedulers)
- Introduction to Docker (Containers vs VMs, Use Cases)
- Installing Docker and Running Your First Container
- Writing Simple Dockerfiles
- Basic Docker Commands (
build
,run
,ps
,stop
,rm
) - Introduction to Prefect and Airflow (Concepts Only)
- Simple Shell Scripting for Automation
- Docker Compose for Multi-Container Applications
- Building and Managing Custom Docker Images
- Environment Variables and Volumes in Docker
- Scheduling Workflows with Prefect or Airflow
- Parameterizing and Triggering Workflows
- Monitoring and Logging Automated Tasks
- Using Makefiles for Automation
- Automating Data Pipelines with Python Scripts
- Advanced Docker Networking and Security
- Orchestrating Containers with Kubernetes (Concepts and Basics)
- Building Modular and Reusable Workflow DAGs (Airflow/Prefect)
- Error Handling and Retry Strategies in Orchestration Tools
- Integrating CI/CD Pipelines (GitHub Actions, GitLab CI)
- Dynamic Workflow Generation
- Managing Secrets and Credentials Securely
- Automated Testing of Data Pipelines
- Scaling Workflows and Infrastructure (Kubernetes, Cloud Runners)
- Custom Operators/Sensors in Airflow or Custom Blocks in Prefect
- Distributed Task Execution and Parallelism
- Monitoring, Alerting, and Observability for Automated Workflows
- Advanced Docker Topics (Swarm, Multi-stage Builds, Image Optimization)
- Infrastructure as Code (Terraform, CloudFormation) for Automation
- End-to-End Data Pipeline Automation (from Ingestion to Reporting)
- Security, Compliance, and Auditing in Automated Workflows
- What is MLOps? (Overview and Use Cases)
- Introduction to Machine Learning Lifecycle
- Version Control for Code and Data (Git, DVC)
- Basics of Model Training and Evaluation
- Introduction to Model Serialization (Pickle, Joblib, ONNX)
- Manual Model Deployment (Flask/FastAPI)
- Tracking Experiments with Spreadsheets or Simple Tools
- Automated Model Training Pipelines (with Prefect, Airflow, or similar)
- Model Tracking with MLflow or Weights & Biases
- Data Validation and Data Drift Detection
- Model Registry Concepts
- Containerizing ML Models with Docker
- Batch and Real-time Inference Basics
- Monitoring Model Performance (Basic Metrics)
- Feature Store Concepts
- CI/CD for ML (Automated Testing, Linting, and Deployment)
- Advanced Model Monitoring (Drift, Outliers, Data Quality)
- Automated Retraining and Model Versioning
- Model Serving at Scale (Kubernetes, Seldon, KFServing)
- Advanced Feature Store Usage
- Secure Model Deployment (API Keys, Auth, RBAC)
- A/B Testing and Canary Releases for Models
- End-to-End ML Pipeline Automation (from Data Ingestion to Monitoring)
- Multi-cloud and Hybrid MLops Architectures
- Infrastructure as Code for MLops (Terraform, CloudFormation)
- Advanced Model Governance and Compliance
- Custom ML Platform Development
- Cost Optimization and Resource Management for ML Workloads
- Integrating MLops with DataOps and DevOps
- Advanced Security and Auditability in ML Systems
- Introduction to Data Warehousing Concepts
- Basics of Data Modeling (Star, Snowflake Schemas)
- Introduction to ETL/ELT Concepts
- Data Quality Fundamentals
- Introduction to Cloud Platforms (AWS, GCP, Azure) for Data
- Basic Data Visualization (Tableau, Power BI, Looker)
- Introduction to APIs and REST
- Data Lake Concepts and Architecture
- Data Catalogs and Metadata Management
- Data Lineage and Provenance
- Data Privacy Basics (GDPR, HIPAA Overview)
- Working with NoSQL Databases (MongoDB, Cassandra)
- Streaming Data Basics (Kafka, Kinesis)
- Data Serialization Formats (Parquet, Avro, ORC)
- Scheduling and Automation with Cloud Services (Cloud Composer, AWS Step Functions)
- Data Governance Frameworks
- Master Data Management (MDM)
- Advanced Data Modeling (Slowly Changing Dimensions, Factless Fact Tables)
- Real-time Data Processing Architectures
- Data Mesh and Data Fabric Concepts
- Advanced Data Privacy and Anonymization Techniques
- Data API Design and Management (GraphQL, gRPC)
- Data Migration Strategies (On-prem to Cloud, Cloud to Cloud)
- Data Architecture for Large-scale Systems
- Multi-cloud and Hybrid Data Architectures
- Data Monetization and Data-as-a-Service
- Advanced Data Security (Encryption at Rest/In Transit, Key Management)
- Data Ethics and Responsible AI
- Building Custom Data Platforms
- DataOps Best Practices and Tooling
- Advanced Data Sharing and Collaboration (Data Clean Rooms, Secure Data Exchange)