Test to see if there's a difference in the recorded Query ID for dbt_artifacts model executions when post-hooks use run_query vs call statement
Project setup:
# dbt_project.yml
name: my_dbt_project
create or replace function dbt_jyeo.fake_sleep(seconds int64) | |
returns string | |
language python | |
options(runtime_version="python-3.11", entry_point="main") | |
as r''' | |
# python function takes time to spin up so this is a lower bound (wait at minimum). | |
import time | |
def main(seconds): | |
time.sleep(seconds) | |
return f'waited at least {seconds} seconds' |
# pip install pyodbc | |
import pyodbc | |
# From Entra. | |
# https://prodata.ie/2023/11/15/service-principal-fabric/ | |
service_principal_client_id = "<Client ID>" | |
service_principal_client_secret = "<Client Secret>" | |
# From Fabric UI - there is a 'SQL connection string' that we will use as the 'server'. | |
server = "<guid>-<guid>.datawarehouse.fabric.microsoft.com" |
Update: See this article on sharing Python functions across models https://github.com/jeremyyeo/the-hitchhikers-guide-to-dbt/tree/main/snowflake-python-models-shared-functions
Here's a quick example of converting a dbt jinja macro used in a sql model into a python function that is used in a python model instead. It is currently not possible to use a jinja macro as is in a python model.
# dbt_project.yml
name: my_dbt_project
profile: all
Loom: https://www.loom.com/share/84f2ae5463fa48048b9c578244ceb440
Note: dbt's responsiblity is to generate the same DDL/DML everytime for the same dbt sql/jinja. dbt is not responsible for making sure your data is unique, it is not responsible for the shape of your data, etc - you yourself are responsible for that.
At a high level, what we're trying to do here is to:
snappy
and the data it is snapshotting - the model raw
.dbt_scd_id
is still unique.import csv | |
import datetime | |
import string | |
from faker import Faker # make sure to `pip install Faker` | |
NUMS = 10000 | |
fake = Faker() | |
current_time = datetime.datetime.utcnow() | |
# header = [ |
dbt has many types of "nodes"/"resources" - e.g. models, sources, seeds - so which of them actually respect the generate_schema_name()
macro? Let's have a look.
The following applies to the
generate_database_name()
macro as well.
The following is tested using:
Core:
By default dbt has functionality that auto-expands similar column types (i.e. varchar(3)
to varchar(4)
) if the incoming data is too large (https://docs.getdbt.com/faqs/Snapshots/snapshot-schema-changes). We can see this happening like so:
-- models/foo.sql
{{ config(materialized='incremental') }}
select 'foo' as c
Official docs: https://docs.getdbt.com/reference/dbt-jinja-functions/ref#forcing-dependencies
If we have a model that uses a macro and within that macro is a ref() to another model like so:
-- models/foo.sql
select 1 id
If you're using an orchestration platform that only supports UTC timezones - you may find it difficult cumbersome to schedule jobs to run on local timezones, accounting for daylight savings time switchovers, etc. Let's see how we can write a dbt macro using some builtin python modules to help us out.
-- macros/is_nz_business_hours.sql
{% macro is_nz_business_hours() %}
{% set flag = 0 %}
{% set dt = modules.datetime %}
{% set pz = modules.pytz %}