Run terraform fmt
to format your code consistently
Use terraform validate
to check for syntax or semantic issues before apply
Adopt tflint
or similar linters to catch anti-patterns or unused code
Go to https://developer.apple.com/downloads/index.action and search for "Command line tools" and choose the one for your Mac OSX
Go to http://brew.sh/ and enter the one-liner into the Terminal, you now have brew
installed (a better Mac ports)
Install transmission-daemon with
brew install transmission
Copy the startup config for launchctl with
ln -sfv /usr/local/opt/transmission/*.plist ~/Library/LaunchAgents
version: '2' | |
services: | |
minio: | |
restart: always | |
image: docker.io/bitnami/minio:2021 | |
ports: | |
- '9000:9000' | |
environment: | |
- MINIO_ROOT_USER=miniokey |
# suppose my data file name has the following format "datatfile_YYYY_MM_DD.csv"; this file arrives in S3 every day. | |
file_suffix = "{{ execution_date.strftime('%Y-%m-%d') }}" | |
bucket_key_template = 's3://[bucket_name]/datatfile_{}.csv'.format(file_suffix) | |
file_sensor = S3KeySensor( | |
task_id='s3_key_sensor_task', | |
poke_interval=60 * 30, # (seconds); checking file every half an hour | |
timeout=60 * 60 * 12, # timeout in 12 hours | |
bucket_key=bucket_key_template, | |
bucket_name=None, | |
wildcard_match=False, |
from airflow import DAG | |
from airflow.operators.sensors import S3KeySensor | |
from airflow.operators import BashOperator | |
from datetime import datetime, timedelta | |
yday = datetime.combine(datetime.today() - timedelta(1), | |
datetime.min.time()) | |
default_args = { | |
'owner': 'msumit', |
with DAG(**dag_config) as dag: | |
# Declare pipeline start and end task | |
start_task = DummyOperator(task_id='pipeline_start') | |
end_task = DummyOperator(task_id='pipeline_end') | |
for account_details in pipeline_config['task_details']['accounts']: | |
#Declare Account Start and End Task | |
if account_details['runable']: | |
acct_start_task = DummyOperator(task_id=account_details['account_id'] + '_start') | |
acct_start_task.set_upstream(start_task) |
Setup Parquet-tools
brew install parquet-tools
Help parquet-tools -h
parquet-tools rowcount part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet
parquet-tools head -n 1 part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet
parquet-tools meta part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet
Context The Integration team has deployed a cron job to dump a CSV file containing all the new Shopify configurations daily at 2 AM UTC. The task will be to build a daily pipeline that will :
download the CSV file from https://alg-data-public.s3.amazonaws.com/[YYYY-MM-DD].csv, filter out each row with empty application_id, add a has_specific_prefix column set to true if the value of index_prefix differs from shopify_ else to false load the valid rows to a Postresql instance The pipeline should process files from 2019-04-01 to 2019-04-07.
play.modules.enabled += "com.samklr.KamonModule" | |
kamon { | |
environment { | |
service = "my-svc" | |
} | |
jaeger { |
A running example of the code from:
This gist creates a working example from blog post, and a alternate example using simple worker pool.
TLDR: if you want simple and controlled concurrency use a worker pool.