Skip to content

Instantly share code, notes, and snippets.

View jeremyyeo's full-sized avatar

Jeremy Yeo jeremyyeo

View GitHub Profile
@jeremyyeo
jeremyyeo / bq_fake_sleep.sql
Created April 16, 2025 23:46
Pseudo-sleep function for BQ
create or replace function dbt_jyeo.fake_sleep(seconds int64)
returns string
language python
options(runtime_version="python-3.11", entry_point="main")
as r'''
# python function takes time to spin up so this is a lower bound (wait at minimum).
import time
def main(seconds):
time.sleep(seconds)
return f'waited at least {seconds} seconds'

Test to see if there's a difference in the recorded Query ID for dbt_artifacts model executions when post-hooks use run_query vs call statement

Project setup:

# dbt_project.yml
name: my_dbt_project
@jeremyyeo
jeremyyeo / test_fabric.py
Last active July 12, 2024 06:21
Test connecting to MS Fabric
# pip install pyodbc
import pyodbc
# From Entra.
# https://prodata.ie/2023/11/15/service-principal-fabric/
service_principal_client_id = "<Client ID>"
service_principal_client_secret = "<Client Secret>"
# From Fabric UI - there is a 'SQL connection string' that we will use as the 'server'.
server = "<guid>-<guid>.datawarehouse.fabric.microsoft.com"
@jeremyyeo
jeremyyeo / README.md
Last active August 8, 2024 03:00
Converting a dbt jinja macro to a python function for use in a python model #dbt

Converting a dbt jinja macro to a python function for use in a python model

Update: See this article on sharing Python functions across models https://github.com/jeremyyeo/the-hitchhikers-guide-to-dbt/tree/main/snowflake-python-models-shared-functions

Here's a quick example of converting a dbt jinja macro used in a sql model into a python function that is used in a python model instead. It is currently not possible to use a jinja macro as is in a python model.

# dbt_project.yml
name: my_dbt_project
profile: all
@jeremyyeo
jeremyyeo / README.md
Last active July 11, 2024 22:52
Debugging dbt snapshots #dbt

Debugging dbt snapshots

Loom: https://www.loom.com/share/84f2ae5463fa48048b9c578244ceb440

Note: dbt's responsiblity is to generate the same DDL/DML everytime for the same dbt sql/jinja. dbt is not responsible for making sure your data is unique, it is not responsible for the shape of your data, etc - you yourself are responsible for that.

At a high level, what we're trying to do here is to:

  1. At the start of the run, make backups of the snapshot snappy and the data it is snapshotting - the model raw.
  2. Test that after the snapshot has completed - if the dbt_scd_id is still unique.
import csv
import datetime
import string
from faker import Faker # make sure to `pip install Faker`
NUMS = 10000
fake = Faker()
current_time = datetime.datetime.utcnow()
# header = [
@jeremyyeo
jeremyyeo / README.md
Last active December 12, 2024 03:41
Which dbt nodes respect the generate_schema_name macro? #dbt

Which dbt nodes respect the generate_schema_name macro?

dbt has many types of "nodes"/"resources" - e.g. models, sources, seeds - so which of them actually respect the generate_schema_name() macro? Let's have a look.

The following applies to the generate_database_name() macro as well.

The following is tested using:

Core:
@jeremyyeo
jeremyyeo / README.md
Last active November 7, 2024 10:13
Stopping dbt from auto-expanding column types #dbt

Stopping dbt from auto-expanding column types

By default dbt has functionality that auto-expands similar column types (i.e. varchar(3) to varchar(4)) if the incoming data is too large (https://docs.getdbt.com/faqs/Snapshots/snapshot-schema-changes). We can see this happening like so:

-- models/foo.sql
{{ config(materialized='incremental') }}
select 'foo' as c
@jeremyyeo
jeremyyeo / README.md
Last active December 12, 2023 01:09
A quick explainer of the "dbt was unable to infer all dependencies for the model" error #dbt
@jeremyyeo
jeremyyeo / README.md
Last active December 9, 2023 22:44
How to get models to run only on local timezones if your scheduler only supports UTC #dbt

How to get models to run only on local timezones if your orchestration platform only supports UTC

If you're using an orchestration platform that only supports UTC timezones - you may find it difficult cumbersome to schedule jobs to run on local timezones, accounting for daylight savings time switchovers, etc. Let's see how we can write a dbt macro using some builtin python modules to help us out.

-- macros/is_nz_business_hours.sql
{% macro is_nz_business_hours() %}
    {% set flag = 0 %}
    {% set dt = modules.datetime %}
    {% set pz = modules.pytz %}