Last active
October 26, 2018 18:11
-
-
Save jhyland87/9b8c8ca3f807ddfe572d8a5e4ed13290 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Question: | |
Tell me about a time that you strongly disagreed with your | |
manager on something you deemed to be very important to the | |
business. What was it about and how did you handle it? | |
Answer: | |
I think the most prevalent example would be the mentality in which my | |
previous manager and I had towards potential problems related to | |
monitoring, logging and security weaknesses. It seemed to me that he | |
had more of a reactive oriented mentality, where as I thought it was | |
in our best into be as proactive as possible. An example scenario | |
would be the lack of monitoring of our servers and applications. We | |
had a Nagios server setup already and actively monitoring some | |
services, but historically, services were only added to Nagios *after* | |
outages. | |
For example, we had MySQL replication setup in multiple environments, | |
and while the servers themselves were monitored for CPU and memory | |
utilization, ping and uptime, other vital services weren't monitored, | |
such as disk utiliation, the mysql daemon status or even the replication | |
status. | |
I feel I was vindicated when eventually the mount point that the | |
MySQL data was stored on reached full capacity, resulting in the | |
mysql service hanging and replication failing. While this may not | |
have taken down the production website directly, anything that relied | |
on the slave server was impacted, such as reports, backups, and some | |
automated scripts. After the disk usage and replication was resolved, | |
all mysql servers in every environment was added to Nagios, as well as | |
the replication status, and anything else that could hinder replication. | |
Question: | |
Which LP was the above question likely applicable to? | |
(https://www.amazon.jobs/principles) | |
Answers: | |
Principal: Insist on high standards - I've always believed that one of | |
the most important tools in maintaining a robust infrastructure and | |
preventing or remediating outages is to monitor as many services as | |
possible. Monitoring not only helps prevent outages, but the statistics | |
and historical data can also be invaluable. | |
Principal: Have Backbone; Disagree and Commit - I made it known (on more | |
than one occasion) that I believed monitoring core services (such as | |
MySQL and replication) was a critical aspect in keeping the infrastructure | |
and production services running smoothly and keeping downtime to a minimum, | |
and as such, it should be a priority. But I was ignored, probably because | |
I got busted downloading shit via Tor (CentOS OS, NOT porn), followed by | |
getting busted sharing that email thread with my BFF. FML. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment