Skip to content

Instantly share code, notes, and snippets.

@kjkuan
Last active May 28, 2024 17:23
Show Gist options
  • Save kjkuan/f551c152a865519b397645170e57ee39 to your computer and use it in GitHub Desktop.
Save kjkuan/f551c152a865519b397645170e57ee39 to your computer and use it in GitHub Desktop.
Detailed work experience @ theScore Inc.
  • Helped migrated the company's cloud infrastructure from RightScale to AWS, and automated code/service deployments to AWS using a combination of Ansible and Bash. The migration modernized the company's cloud infrastructure and made it more manageable and future proof.
  • Implemented a custom Ansible inventory script in Python using Boto that queries AWS/EC2 to provide host meta info(e.g., tags, host names, IPs, ..., and also note that this was done before Ansible had inventory plugins and before the contrib/ec2.py inventory script was available in Ansible). The custom inventory script allowed us to target our EC2 servers by project, environment, and roles, EC2 tags, with our playbooks, making it easy query our server inventory, as well as organizing and applying our playbooks to selected environments.
  • Written playbooks and Ansible roles for deploying application services(usually a combination of Nginx + Unicorn + Rails app). This automated the deployment of the application's backend services so that developers can easily deploy changes to different environments(e.g., production, staging, ...) with a single command from CLI.
  • Implemented a simple web app in Python for tracking deploys launched from developers' CLI in real time. Deploy logs and deploy history were tracked and searchable within the application, and one could also start a deploy via the application's web UI. This application gives us visibility so that we know who deployed what and when, and what the status of a deploy is). This was a self-initiated project that was mostly done in my spare time.
  • Initiated, planned, and implemented the migration from AWS EC2 Classic to VPC. I researched and established the company's VPC infrastructure: Multiple VPCs, with private+public subnets and managed NAT gateways per availability zone, plus route tables, VPC peerings, security groups, and S3 endpoints all properly set up and working. Further more, all the setup above, including auto-scaling group configurations, were codified in Terraform.
  • Implemented a custom wrapper script for Terraform that uses a separate state file per region per Terraform configuration file to provide better isolation of infrastructure changes. The script also renders Terraform config files via Jinja2 template engine and queries Terraform state files using jq and makes various AWS resource IDs available for referencing in other configurations. Moreover, the script utilizes Consul, which we were already using for service discovery, for distributed locking to ensure no two users change the same state file before the other is done with it. (NOTE: This was all before Terraform remote state and locking were available)
  • Created Taskit, a task runner framework in Bash, to help organize shell scripting tasks used for provisioning our entire AWS infrastructure with Terraform and Ansible. We already had many shell scripts for performing different tasks and gluing different tools together. Having such tool provided a framework for authoring and running shell scripts, and gave us better operational consistency.
  • Streamlined the automatic creation of our entire infrastructure and services in another AWS region for disaster recovery, and to exercise our Terraform configurations and service deploys. This was implemented with the help of the Taskit tool mentioned above, and it was run on a nightly schedule to catch configuration errors or any problem that might have happened when doing a disaster recovery in another region.
  • Implemented custom metric collection using Diamond for collecting EC2 instance metrics and sending machine load metrics to AWS CloudWatch, so we could scale based on server load rather than CPU utilization. This was an experiment, proposed and carried out by me, in response to the question: How can we scale up faster? It had allowed us to scale up faster in response to traffic spikes because the server load metric takes into account network I/O in addition to CPU utilization.
  • Once helped optimize an application SQL query with a 5x speed up. This query was becoming a bottle neck and bringing the service down during peak hours.

I had also written numerous scripts to automate the boring stuffs. Some notable examples are:

  • A custom AWS Ansible module making use of awscli that allows you to write awscli command invocations as a series of Ansible tasks in a playbook. This means the json response of such awscli command can be made available for use in later tasks. This quick hack in particular allowed us to write config files for setting up AWS auto-scaling groups easily in the early days before Terraform came along.

  • A script for showing server inventory that allows interactive filtering (using percol or fzf) to quickly select a server to ssh into. Later, wrote another script with the same interactive filtering, but allows one to connect to the database used by the application running on the selected server.

  • Database backup and refresh scripts using bash, mysqldump, mysql, pg_dump, psql. The scripts can be customized with config files, and can be easily extended to support other relational DBs.

  • A collection of scripts for creating VirtualBox VM images, using Vagrant. Each VM boots into the development environment that contains a restoration of the production DB for the project, plus any services and tools needed for doing development or debugging with the project.

  • A script that utilizes the awscli tool to take and rotate snapshots of EBS volumes of EC2 instances, identified by their environment + group + role tags. (This was before AWS Life Cycle Manager was available)

  • A cron job wrapper script that logs the cronjob's outputs and sends you an email if the cron job fails with the outputs and other meta information(e.g., cron job name, start time, hostname, IP, instance id, ... etc.)

  • User account removal/disabling scripts using bash, curl, ssh, python and selenium. Where available, the service's API is used, otherwise, we use Python + Selenium with webdriver to script the browser to disable/remove the user. This tool is implemented as a collection of small scripts/plugins with a main script that triggers the plugins to do the jobs in parallel.

  • A script for automatically merge changes committed to a release branch during code-freeze to the development branch. This was requested by the Android team lead to help automating part of their release process.

  • A script, using Boto, that picks an EC2 instance from an auto-scaling group(ASG), bundles it to create an AMI, then clone the existing launch configuration of the ASG, updates the new config to use the AMI, and associate it with the ASG. It also cleans up old, disassociated AMIs while keeping three most recent ones. (NOTE: This was done in the early days before it was possible to update auto-scaling groups or launch configurations in AWS Console. Nowadays, I'd look into using Launch Template for ASGs )

  • A script that rotates an application log/metric file, and upload metrics to S3. The script requires some carefully planned logic to ensure no logs will be lost, and any failures will be retried on the next run.

  • Implemented a test script that find and split test cases across multliple CircleCI nodes so they can run in parallel.

  • A TestRail user script(javascript) for triggering a Jenkins build from TestRail UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment