Topics

PostgreSQL
DBT
Prefect
Snowflake
Automation and Orchestration
MLops
Other topics

Overview

This curriculum assumes you're reasonably proficient in a popular language like: python, go, typescript. And, have at least a working knowledge of *nix, shell, git and docker..

1. PostgreSQL

1.1 Easy

1.2 Medium

1.3 Hard

Window Functions (ROW_NUMBER, RANK, LAG, LEAD, aggregate window functions)
Common Table Expressions (CTEs) (WITH)
Stored Procedures and Functions (CREATE FUNCTION, CREATE PROCEDURE, basic PL/pgSQL)
Triggers (CREATE TRIGGER)
Transactions (ACID properties, BEGIN, COMMIT, ROLLBACK)
Understanding EXPLAIN and basic query optimization
More complex data types (JSON/JSONB, Arrays)
User Defined Types (CREATE TYPE)

1.4 Advanced

2. DBT

2.1 Easy

What is DBT? (Overview and Use Cases)
Installing DBT and Setting Up a Project
DBT Project Structure and Files
Writing Basic Models (.sql files)
Running and Building Models (dbt run)
Using Seeds (dbt seed)
Simple Jinja Usage in SQL

2.2 Medium

Sources and Refactoring with ref() and source()
Using Variables and Macros
Testing Data with Built-in Tests (unique, not_null, etc.)
Documentation (dbt docs generate, dbt docs serve)
Using Snapshots
Incremental Models
Configuring Model Materializations

2.3 Hard

Writing Custom Tests and Macros
Advanced Jinja and Control Structures
Using Hooks and Operations
Advanced Model Configurations (tags, ephemeral, etc.)
Source Freshness and Auditing
Deployment Best Practices
Debugging and Logging

2.4 Advanced

DBT in Production (CI/CD, Scheduling)
DBT Cloud vs DBT Core
Managing Large Projects (Packages, Modularization)
Advanced Performance Optimization
Integrating DBT with Data Orchestration Tools (Airflow, Prefect)
Writing and Publishing DBT Packages
Security and Access Control in DBT

3. Prefect

3.1 Easy

3.2 Medium

3.3 Hard

Deployments and Infrastructure Blocks
Advanced Error Handling and Retries
Using Collections and Integrations (e.g., S3, GCS, Databases)
Orchestrating Flows with Subflows
Using Secrets and Environment Variables
Custom Task and Flow Classes

3.4 Advanced

Prefect Agents and Work Queues
Scaling and High Availability
Custom Infrastructure and Execution Environments
Advanced Monitoring and Alerting
CI/CD Integration for Prefect Deployments
Security Best Practices and Access Control
Extending Prefect with Plugins and Custom Collections

4. Snowflake

4.1 Easy

What is Snowflake? (Overview and Architecture)
Setting Up a Snowflake Account and UI Tour
Understanding Warehouses, Databases, and Schemas
Creating and Querying Tables
Basic SQL in Snowflake (SELECT, INSERT, UPDATE, DELETE)
Loading Data with Web UI and Worksheets

4.2 Medium

Using Snowflake Stages (Internal/External)
Bulk Loading Data (COPY INTO)
Working with File Formats
Time Travel and Data Retention
Cloning Databases, Schemas, and Tables
Working with Views and Secure Views
Using Snowflake Functions and Sequences
Query Performance Basics

4.3 Hard

Streams and Tasks (Change Data Capture, Automation)
Materialized Views
Semi-structured Data (VARIANT, JSON, XML, PARSE/FLATTEN)
User-defined Functions (UDFs) and Procedures
Data Sharing and Data Marketplace
Resource Monitors and Usage Tracking
Query Profiling and Optimization

4.4 Advanced

Snowflake Security (Roles, Policies, Masking, Row Access)
Data Governance and Compliance Features
Advanced Performance Tuning (Clustering, Result Caching)
Snowpipe (Continuous Data Ingestion)
External Tables and Data Lake Integration
Working with Snowpark (Python, Java, Scala)
Automation and Orchestration with Third-party Tools
Multi-cloud and Cross-region Features

5. Automation and Orchestration

5.1 Easy

What is Automation and Orchestration? (Overview and Use Cases)
Introduction to Scheduling (Cron, Task Schedulers)
Introduction to Docker (Containers vs VMs, Use Cases)
Installing Docker and Running Your First Container
Writing Simple Dockerfiles
Basic Docker Commands (build, run, ps, stop, rm)
Introduction to Prefect and Airflow (Concepts Only)
Simple Shell Scripting for Automation

5.2 Medium

Docker Compose for Multi-Container Applications
Building and Managing Custom Docker Images
Environment Variables and Volumes in Docker
Scheduling Workflows with Prefect or Airflow
Parameterizing and Triggering Workflows
Monitoring and Logging Automated Tasks
Using Makefiles for Automation
Automating Data Pipelines with Python Scripts

5.3 Hard

Advanced Docker Networking and Security
Orchestrating Containers with Kubernetes (Concepts and Basics)
Building Modular and Reusable Workflow DAGs (Airflow/Prefect)
Error Handling and Retry Strategies in Orchestration Tools
Integrating CI/CD Pipelines (GitHub Actions, GitLab CI)
Dynamic Workflow Generation
Managing Secrets and Credentials Securely
Automated Testing of Data Pipelines

5.4 Advanced

Scaling Workflows and Infrastructure (Kubernetes, Cloud Runners)
Custom Operators/Sensors in Airflow or Custom Blocks in Prefect
Distributed Task Execution and Parallelism
Monitoring, Alerting, and Observability for Automated Workflows
Advanced Docker Topics (Swarm, Multi-stage Builds, Image Optimization)
Infrastructure as Code (Terraform, CloudFormation) for Automation
End-to-End Data Pipeline Automation (from Ingestion to Reporting)
Security, Compliance, and Auditing in Automated Workflows

6. MLops

6.1 Easy

What is MLOps? (Overview and Use Cases)
Introduction to Machine Learning Lifecycle
Version Control for Code and Data (Git, DVC)
Basics of Model Training and Evaluation
Introduction to Model Serialization (Pickle, Joblib, ONNX)
Manual Model Deployment (Flask/FastAPI)
Tracking Experiments with Spreadsheets or Simple Tools

6.2 Medium

Automated Model Training Pipelines (with Prefect, Airflow, or similar)
Model Tracking with MLflow or Weights & Biases
Data Validation and Data Drift Detection
Model Registry Concepts
Containerizing ML Models with Docker
Batch and Real-time Inference Basics
Monitoring Model Performance (Basic Metrics)
Feature Store Concepts

6.3 Hard

CI/CD for ML (Automated Testing, Linting, and Deployment)
Advanced Model Monitoring (Drift, Outliers, Data Quality)
Automated Retraining and Model Versioning
Model Serving at Scale (Kubernetes, Seldon, KFServing)
Advanced Feature Store Usage
Secure Model Deployment (API Keys, Auth, RBAC)
A/B Testing and Canary Releases for Models

6.4 Advanced

End-to-End ML Pipeline Automation (from Data Ingestion to Monitoring)
Multi-cloud and Hybrid MLops Architectures
Infrastructure as Code for MLops (Terraform, CloudFormation)
Advanced Model Governance and Compliance
Custom ML Platform Development
Cost Optimization and Resource Management for ML Workloads
Integrating MLops with DataOps and DevOps
Advanced Security and Auditability in ML Systems

7. Other topics

7.1 Easy

Introduction to Data Warehousing Concepts
Basics of Data Modeling (Star, Snowflake Schemas)
Introduction to ETL/ELT Concepts
Data Quality Fundamentals
Introduction to Cloud Platforms (AWS, GCP, Azure) for Data
Basic Data Visualization (Tableau, Power BI, Looker)
Introduction to APIs and REST

7.2 Medium

Data Lake Concepts and Architecture
Data Catalogs and Metadata Management
Data Lineage and Provenance
Data Privacy Basics (GDPR, HIPAA Overview)
Working with NoSQL Databases (MongoDB, Cassandra)
Streaming Data Basics (Kafka, Kinesis)
Data Serialization Formats (Parquet, Avro, ORC)
Scheduling and Automation with Cloud Services (Cloud Composer, AWS Step Functions)

7.3 Hard

Data Governance Frameworks
Master Data Management (MDM)
Advanced Data Modeling (Slowly Changing Dimensions, Factless Fact Tables)
Real-time Data Processing Architectures
Data Mesh and Data Fabric Concepts
Advanced Data Privacy and Anonymization Techniques
Data API Design and Management (GraphQL, gRPC)
Data Migration Strategies (On-prem to Cloud, Cloud to Cloud)

7.4 Advanced

Data Architecture for Large-scale Systems
Multi-cloud and Hybrid Data Architectures
Data Monetization and Data-as-a-Service
Advanced Data Security (Encryption at Rest/In Transit, Key Management)
Data Ethics and Responsible AI
Building Custom Data Platforms
DataOps Best Practices and Tooling
Advanced Data Sharing and Collaboration (Data Clean Rooms, Secure Data Exchange)

fxadecimal/2025-dev-to-data-engineer-ml-ops.md

Topics

Overview

1. PostgreSQL

1.1 Easy

1.2 Medium

1.3 Hard

1.4 Advanced

2. DBT

2.1 Easy

2.2 Medium

2.3 Hard

2.4 Advanced

3. Prefect

3.1 Easy

3.2 Medium

3.3 Hard

3.4 Advanced

4. Snowflake

4.1 Easy

4.2 Medium

4.3 Hard

4.4 Advanced

5. Automation and Orchestration

5.1 Easy

5.2 Medium

5.3 Hard

5.4 Advanced

6. MLops

6.1 Easy

6.2 Medium

6.3 Hard

6.4 Advanced

7. Other topics

7.1 Easy

7.2 Medium

7.3 Hard

7.4 Advanced