This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT | |
/* Meta columns to have same pattern across sources | |
* ___hash for data deduplication and creating unique id that's stable | |
* ___source_pk to be able to PARTITION BY to get the correct data per object | |
* ___as_of telling the business validity. In case something is a range already in the source system, then split into two rows | |
* ___loaded_at to indicate when this particular table was loaded | |
* ___original_loaded_at (optional) in case we have some indicator when the data originally has arrived into our system | |
*/ | |
TO_HEX(SHA256(TO_JSON_STRING(my_table))) AS ___hash, | |
my_table.id as ___source_pk, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WITH | |
base AS ( | |
SELECT | |
SAFE_CAST ( | |
JSON_EXTRACT_SCALAR (payload, '$.timestamp') AS timestamp | |
) AS event_time, | |
JSON_EXTRACT_SCALAR (payload, '$.action') AS action_type, | |
JSON_EXTRACT_SCALAR (payload, '$.actor_id') AS actor_id, | |
JSON_EXTRACT_SCALAR (payload, '$.actor_username') AS actor_username, | |
FROM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE OR REPLACE FUNCTION _wstd_state(state numeric[3], val numeric, weight numeric) | |
RETURNS numeric[3] AS $$ | |
DECLARE | |
s_n_1 CONSTANT numeric NOT NULL := state[1]; | |
mu_n_1 CONSTANT numeric NOT NULL := state[2]; | |
w_n_1 CONSTANT numeric NOT NULL := state[3]; | |
s_n numeric; | |
mu_n numeric; | |
w_n numeric; | |
BEGIN |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import contextlib | |
import tempfile | |
import random | |
def two_pass_shuffle(input_files, output_files, temp_file_count, header_lines=0): | |
""" | |
two_pass_shuffle | |
Suffle data larger that can be shuffled in memory. | |
Implementation based on: |