Skip to content

Instantly share code, notes, and snippets.

@damaya
Created March 6, 2023 09:00
Show Gist options
  • Save damaya/4e4c1037c8c1e18a903a366ee382b174 to your computer and use it in GitHub Desktop.
Save damaya/4e4c1037c8c1e18a903a366ee382b174 to your computer and use it in GitHub Desktop.
datadog notes
Datadog
Industry standards: practical monitoring sre notes
Datadog Backend: sas in the cloud
Datadog agent, where is it installed?
- for k8s is in all nodes, using helm chart override taken from https://github.com/DataDog/helm-charts/blob/main/charts/datadog/README.md
For vms it is installed with Ansible on each vm
please note, one thing is the agent and another the related instrumentation in the app for APM
Why monitoring?
Proactive monitoring
Monitoring use cases
Monitoring terminology and processes
Types of monitoring
Overview of monitoring tools
metrics, measurement, and goal. A monitoring activity is all about measuring something and comparing that result against a goal or a target.
Core Monitoring
Application software
Third-party software
Infrastructure
The monitoring of these three components constitutes core monitoring. There are many other aspects – both internal to the system, such as its health, and external, such as security – that would make the monitoring of a software system complete.
Proactive Monitoring
Setting up alerts to warn of impending issues
The monitoring solution must be designed to warn of impending issues with the software system. This is easy with infrastructure components as it is easy to track metrics such as memory usage, CPU utilization, and disk space, and alert on any usage over the limits.
However, such a requirement would be tricky at the application level. Sometimes applications can fail on perfectly configured infrastructure. To mitigate that, software applications should provide insights into what is going under the hood. In monitoring jargon, it is called observability these days and we will see later in the book how that can be implemented in Datadog.
Having a feedback loop
A mature monitoring system warning of impending issues that would help to take mitigation steps is not good enough. Such warnings must also be used to resolve issues automatically (for example, spinning off a new virtual machine with enough disk space when an existing virtual host runs out of disk space), or be fed into the redesigning of the application or infrastructure to avoid the issue from happening in the future.
Monitoring Backend and Monitoring Agent
Which App to monitor?
dev shop
Monitoring as Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment