Skip to content

Instantly share code, notes, and snippets.

@samklr
samklr / tf.md
Last active June 2, 2025 14:17
TF Practices

GCP deployment with Terraform

Basics

Run terraform fmt to format your code consistently

Use terraform validate to check for syntax or semantic issues before apply

Adopt tflint or similar linters to catch anti-patterns or unused code

@samklr
samklr / INSTALL.md
Created October 10, 2021 12:15 — forked from jpillora/INSTALL.md
Headless Transmission on Mac OS X
  1. Go to https://developer.apple.com/downloads/index.action and search for "Command line tools" and choose the one for your Mac OSX

  2. Go to http://brew.sh/ and enter the one-liner into the Terminal, you now have brew installed (a better Mac ports)

  3. Install transmission-daemon with

    brew install transmission
    
  4. Copy the startup config for launchctl with

    ln -sfv /usr/local/opt/transmission/*.plist ~/Library/LaunchAgents
    
version: '2'
services:
minio:
restart: always
image: docker.io/bitnami/minio:2021
ports:
- '9000:9000'
environment:
- MINIO_ROOT_USER=miniokey
# suppose my data file name has the following format "datatfile_YYYY_MM_DD.csv"; this file arrives in S3 every day.
file_suffix = "{{ execution_date.strftime('%Y-%m-%d') }}"
bucket_key_template = 's3://[bucket_name]/datatfile_{}.csv'.format(file_suffix)
file_sensor = S3KeySensor(
 task_id='s3_key_sensor_task',
 poke_interval=60 * 30, # (seconds); checking file every half an hour
 timeout=60 * 60 * 12, # timeout in 12 hours
 bucket_key=bucket_key_template,
 bucket_name=None,
 wildcard_match=False,
@samklr
samklr / s3_sensor.py
Created February 12, 2021 20:03 — forked from msumit/s3_sensor.py
Airflow file sensor example
from airflow import DAG
from airflow.operators.sensors import S3KeySensor
from airflow.operators import BashOperator
from datetime import datetime, timedelta
yday = datetime.combine(datetime.today() - timedelta(1),
datetime.min.time())
default_args = {
'owner': 'msumit',
with DAG(**dag_config) as dag:
# Declare pipeline start and end task
start_task = DummyOperator(task_id='pipeline_start')
end_task = DummyOperator(task_id='pipeline_end')
for account_details in pipeline_config['task_details']['accounts']:
#Declare Account Start and End Task
if account_details['runable']:
acct_start_task = DummyOperator(task_id=account_details['account_id'] + '_start')
acct_start_task.set_upstream(start_task)
@samklr
samklr / parquet_tools.md
Last active January 22, 2021 19:50
Parquet Tools

Setup Parquet-tools brew install parquet-tools

Help parquet-tools -h

parquet-tools rowcount part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet

parquet-tools head -n 1 part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet

parquet-tools meta part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet

@samklr
samklr / gist:743d927dd0a5f5671c64b1d346e7b318
Created November 26, 2020 21:00
Data Engineering assignement

Context The Integration team has deployed a cron job to dump a CSV file containing all the new Shopify configurations daily at 2 AM UTC. The task will be to build a daily pipeline that will :

download the CSV file from https://alg-data-public.s3.amazonaws.com/[YYYY-MM-DD].csv, filter out each row with empty application_id, add a has_specific_prefix column set to true if the value of index_prefix differs from shopify_ else to false load the valid rows to a Postresql instance The pipeline should process files from 2019-04-01 to 2019-04-07.

play.modules.enabled += "com.samklr.KamonModule"
kamon {
environment {
service = "my-svc"
}
jaeger {
@samklr
samklr / golang_job_queue.md
Created November 9, 2019 10:02 — forked from harlow/golang_job_queue.md
Job queues in Golang