DataDog CheatSheet

DataDog UI is complicated, and hard to find your way around. But it boils down to this basically:

Teams
Notebooks
Dashboards
Monitors
Logs
Metrics
Spans
Traces
Filters
Integrations

This doc is for using DataDog UI to debug the logs you already have.

Teams

Find your team, or make a team page.
Add links, dashboards, notebooks, and monitors to your team page, for easy finding later.

Notebooks

Basically a rich document you can embed widgets into for easy debugging later.
README: DataDog notebooks have /widgetname functionality like Notion.
Make a few key notebooks, takes 2 minutes! Centralize the DataDog common pages you use.

Dashboards

Basically notebooks without arbitrary text.
Place to collect random charts.

Monitors

Thing you create to notify you in various ways when things go wrong.
- Notify by: Slack, OpsGenie, email, etc.
Configure metric values which, if reached, will warn or alert you.

Custom template syntax for what your monitor says when it notifies you.

Example:

{{#is_alert}}
🚨 Failed consistently on **{{shard}}** 🚨
📊 [Datadog](https://app.datadoghq.com/monitors/123)
@slack-my-teams-notifications
@opsgenie-my-team P2
{{/is_alert}}
{{#is_warning}}
⚠️ Failed a few times on **{{shard}}** ⚠️
📊 [Datadog](https://app.datadoghq.com/monitors/123)
@slack-my-teams-notifications
{{/is_warning}}

Learn about custom integrations in your template message.
- @slack-<channel-name>
- @opsgenie-<team>

Logs

Raw JSON organized by scope (debug/info/warn/error).
Lowest level everything else is based on (Monitors, Metrics, etc.).
Change date in upper right (learn about the syntax for writing date ranges, it’s neat).
Try changing filter to group by patterns to see aggregations of logs.
Look at individual logs for specific error messages, to dig into the JSON details.
Left sidebar has every possible thing you can filter by, if necessary.

Metrics

Basically counts of patterns DataDog tracks, which you can aggregate/etc..
Visualized into charts in monitors and dashboards (or notebooks).
Not the same as logs (logs come from your app as JSON).
Learn DataDog metric syntax if you want to get used to writing custom metric queries.
Make metrics to track lag, response time, spikes in traffic, and other “stats”, etc..

Spans

Timed events that occur in your logs, like showing how long things occurred.
Aggregated into traces.
To learn/explain better…

Traces

To learn/explain better…

Filters

Basically the way of filtering your logs for debugging.
Very complicated/advanced. Takes some getting used to.
To explain later all that you can do…

Integrations

Integrate third-party services like Slack/OpsGenie for the company.
Don’t have to deal with very often.

lancejpollard/cheatsheet.md