When you describe your technology stack, from the bottom to the top layer: hardware, OS, middleware and application - with their respective configurations - it is easy to notice that as we go up in the stack, more frequent are the changes. Your hardware will hardly change, your operating system has a long life cycle and your middleware will keep up with applications needs, the element the most changes, even if your release cycle is long (weeks or months), applications will be the volatile.
In "The Practice of System and Network Administration", the authors categorize the biggest "time sinkholes" in IT as: manual/non-standard provisioning of Operating Systems and application deployments. These time sink holes will consume your time, with either repetitive tasks or unplanned work.
How so? Let's say you provision a new server without NTP properly configured and a small percentage of your requests start to have a strange behavior because the application uses some sort of scheduler that relies on correct time, in a cluster of dozens of servers. When you look that way it is easy to stop, but how long it would take to your team figure it out? Incidents, or unplanned work consumes lots of your time, and even worse, time of your greatest talents. Should I really be wasting time investigating production systems this? It wouldn't be better to put this server aside and automatically provision a new one from scratch?
What about manual deployment? Image 20 binaries being deploy across a farm or nodes with their respective configuration files? How error prone is this? It will eventually end up in unplanned work.
In the State of DevOps Report 2018, stages for DevOps adoption were introduced, and it's no surprise that Stage 0 includes deployment automation and reuse of deployment pattterns, while Stage 1 e 2 focus in standardization of your infrastructure stack in order to reduce inconsistencies across your environment.
NOTA: Stop me if you haven't see it before, but I have seen it more than once: Ops teams using this "standardization" as a excuse to limit development team hability to delivery, forcing them to use a hammer to something that-is-definitely-not-a -nail. Don't do it, the price is extremely hard.
The lesson to be learned here is that the lack of automation does not only increases your lead time, but considerable increases the rate of problems in your process and the amount of unplanned work will rise. If you ever read "The Phoenix Project" you know that this is root of all evil in any value stream and if you don't get rid of it, it will eventually kill your business.
Back to the biggest time sinkholes, why not start with automated operating system instalation? We definitely could, but the results would take longer to appear, since new Virtual Machines are not as frequently created as applications are deployed, in other words, we may not gain free time to power our initiative and it could die prematurely.
Still not convinced? Smaller and frequent releases are also extremely positive from the development side. Explain it a little further...
The first thing to understand is that, although used interchangeably, deployment and *release do NOT mean the same thing. Release implies in letting the user receive a new version while deployment is the technical process of deploying this new version. Let's focus on the technical process of deployment, as the article title suggests.
We need to understand our deployment process from the beginning to the end, with everything that is involved in the middle: tasks, which servers are involved in the process, and which steps are executed, we shall not fall into pitfalls described in "Automating the Unknown".
Which steps are commonly executed in a regular deployment process?
- Deploy application(s)/database(s) or database(s) change(s).
- Stop/start services and monitoring.
- Add/remove server from our load balancers.
- Verify application state (is it ready to serve requests?).
- Manual approval?
Automate the deployment process and leave a manual approval step is like riding a bike with rolling wheels for some people, but as someone told me once: "Better walk with rolling wheels than not walking at all...".
What if a few of these tools involved do not include an API or CLI so I can perform these tasks in an automated way? Well, maybe it's time to think about changing these tools, for each of these groups: application servers, databases, monitoring systems and load balancers, I could point many "open source" tools that are easily automated - in great part thanks to the Unix way. Change your mindset while adopting a new techonology in this field, eliminating options that are not automated, and use your creativity for your legacy, I've seen people versioning Network Appliances configuration files and updating them using FTP.
And guesst what? It's a wonderful time to adopt Open Source tools, in "Accelerate: State of DevOps", produced by DORA (DevOps and Research and Assessment), the use of open source techonologies is predominant in high performance organizations. The logical is pretty simple, open source projects function in a "darwinist" model, where those that do not adapt/evolve, die for the lack of userbase or contributions. Feedback is paramont to software evolution.
Looking to the tasks is easy to identify groups of servers that will be targeted by our automation:
- Deploy application(s)/database(s) or database(s) change(s).
- Stop/start services and monitoring.
- Add/remove server(s) from our load balancer(s).
- Verify application state (is it ready to serve requests?).
A high level deployment process could be written as:
- Stop monitoring.
- To avoid false-positives
- Remover server from the load balancer.
- To prevent the user from receiving an error code.
- Stop the service
- Graceful shutdown (:
- Deploy the new version of the application
- Wait the application to be ready to received new requests.
- Execute steps 3, 2 and 1.
- Do the same for the next N servers.
Have a documentation of your process is nice, but what about an executable documentation of your deployment? Better! Here is what steps 1-5 would look like in Ansible for a pretty popular fully open source stack:
- name: Disable alerts
nagios:
action: disable_alerts
host: "{{ inventory_hostname }}"
services: webserver
delegate_to: "{{ item }}"
loop: "{{ groups.monitoring }}"
- name: Disable servers in the LB
haproxy:
host: "{{ inventory_hostname }}"
state: disabled
backend: app
delegate_to: "{{ item }}"
loop: " {{ groups.lbserver }}"
- name: Stop the service
service: name=httpd state=stopped
- name: Deploy a new version
unarchive: src=app.tar.gz dest=/var/www/app
- name: Verify application state
uri:
url: "http://{{ inventory_hostname }}/app/healthz"
status_code: 200
retries: 5
There are other alternatives to application deployment, but what makes Ansible and excellent choice?
- Multi-tier orchestration (i.e. delegate_to), allowing you to orderly target different groups of servers: monitoring, load balancer, application server, database and etc.
- Rolling upgrade (i.e. serial). The hability to control how changes are made: 1 by 1, N by N, X% at a time.
- Error control, max_faile_percentage and any_errors_fatal, is my process all-in or tolerates fails?
- A vast library of modules to:
- Monitoring (nagios, zabbix and etc)
- Load Balancers (haproxy, F5, Netscaler, Cisco and etc)
- Services (service, command, file)
- Deployment (copy, unarchive)
- Programmatic verifications (command, uri)