Jeremy Yeo jeremyyeo

Test to see if there's a difference in the recorded Query ID for dbt_artifacts model executions when post-hooks use run_query vs call statement

Project setup:

# dbt_project.yml
name: my_dbt_project

Converting a dbt jinja macro to a python function for use in a python model

Update: See this article on sharing Python functions across models https://github.com/jeremyyeo/the-hitchhikers-guide-to-dbt/tree/main/snowflake-python-models-shared-functions

Here's a quick example of converting a dbt jinja macro used in a sql model into a python function that is used in a python model instead. It is currently not possible to use a jinja macro as is in a python model.

# dbt_project.yml
name: my_dbt_project
profile: all

Debugging dbt snapshots

Loom: https://www.loom.com/share/84f2ae5463fa48048b9c578244ceb440

Note: dbt's responsiblity is to generate the same DDL/DML everytime for the same dbt sql/jinja. dbt is not responsible for making sure your data is unique, it is not responsible for the shape of your data, etc - you yourself are responsible for that.

At a high level, what we're trying to do here is to:

At the start of the run, make backups of the snapshot snappy and the data it is snapshotting - the model raw.
Test that after the snapshot has completed - if the dbt_scd_id is still unique.

Which dbt nodes respect the generate_schema_name macro?

dbt has many types of "nodes"/"resources" - e.g. models, sources, seeds - so which of them actually respect the generate_schema_name() macro? Let's have a look.

The following applies to the generate_database_name() macro as well.

The following is tested using:

Core:

Stopping dbt from auto-expanding column types

By default dbt has functionality that auto-expands similar column types (i.e. varchar(3) to varchar(4)) if the incoming data is too large (https://docs.getdbt.com/faqs/Snapshots/snapshot-schema-changes). We can see this happening like so:

-- models/foo.sql
{{ config(materialized='incremental') }}
select 'foo' as c

A quick explainer of the "dbt was unable to infer all dependencies for the model" error

Official docs: https://docs.getdbt.com/reference/dbt-jinja-functions/ref#forcing-dependencies

If we have a model that uses a macro and within that macro is a ref() to another model like so:

-- models/foo.sql
select 1 id

	create or replace function dbt_jyeo.fake_sleep(seconds int64)
	returns string
	language python
	options(runtime_version="python-3.11", entry_point="main")
	as r'''
	# python function takes time to spin up so this is a lower bound (wait at minimum).
	import time
	def main(seconds):
	time.sleep(seconds)
	return f'waited at least {seconds} seconds'

	# pip install pyodbc
	import pyodbc

	# From Entra.
	# https://prodata.ie/2023/11/15/service-principal-fabric/
	service_principal_client_id = "<Client ID>"
	service_principal_client_secret = "<Client Secret>"

	# From Fabric UI - there is a 'SQL connection string' that we will use as the 'server'.
	server = "<guid>-<guid>.datawarehouse.fabric.microsoft.com"

	import csv
	import datetime
	import string
	from faker import Faker # make sure to `pip install Faker`

	NUMS = 10000
	fake = Faker()
	current_time = datetime.datetime.utcnow()

	# header = [

	# https://winsmarts.com/decode-jwt-token-on-terminal-d005ba6c5aa1
	# Requires jq: https://jqlang.org/
	jq -R 'split(".") \| .[1] \| @base64d \| fromjson' <<< "REPLACE WITH JWT"