This is the official engineering blog of Datadog, a leading cloud monitoring and security platform. The blog covers how Datadog builds and operates its own massive infrastructure, from Kubernetes and databases to AI and security. It's a great read for anyone curious about real-world observability at scale, especially if you use or evaluate Datadog's products.
DatadoghqDatadog | The Monitor blog
This is the official engineering blog of Datadog, a leading cloud monitoring and security platform. The blog covers how Datadog builds and operates its own massive infrastructure, from Kubernetes and databases to AI and security. It's a great read for anyone curious about real-world observability at scale, especially if you use or evaluate Datadog's products.
“Datadog's engineering blog: monitoring everything, everywhere, all at once.”
Read this when you want to understand how a top-tier observability platform builds, secures, and scales its own infrastructure and products.
Skip it if you need vendor-neutral advice or deep dives into non-Datadog ecosystems.
Compared to other vendor engineering blogs, this one offers unusually deep, platform-spanning technical content with a strong emphasis on AI and security.
What this is
As told for the tourist
Start Here
A recommended reading path through the code
Start Here
A recommended reading path through the code
- 01
- 02
- 03
- 04
- 05
- 06
- 07
What's inside
9 sections of the codebase
Posting History
Activity over time
The Archive
Every post, searchable and filtered
Diagnose and resolve database performance issues faster with Database Investigator
6mDatadog Database Monitoring introduces Database Investigator, an agentic feature that surfaces root causes and remediation steps for database performance issues.
Analyze cloud costs with flexible spreadsheets in Datadog Sheets
5mDatadog Sheets enables flexible spreadsheet-style analysis of live cloud cost data within Cloud Cost Management.
Datadog for Government achieves FedRAMP® High certification
4mDatadog for Government achieves FedRAMP High certification to support sensitive agency workloads with unified observability, security, and NIST compliance.
Turn security signals into structured investigations with Case Management in Datadog Cloud SIEM
5mDatadog Cloud SIEM's Case Management provides end-to-end workflows to transition from security signals to structured investigations.
Inside Datadog’s AI Research Lab: Meet two PhD candidates behind Toto
7mTwo PhD candidates at Datadog's AI Research Lab discuss their contributions to Toto, a timeseries foundation model.
Monitor and optimize Supabase query performance with Datadog Database Monitoring
5mDatadog Database Monitoring provides Supabase developers with query-level visibility, explain plans, and one-click setup for diagnosing performance issues.
This Month in Datadog - April 2026
3mApril 2026's This Month in Datadog covers the MCP Server, Datadog Experiments, Bits AI Security Analyst, and more.
Add dynamically updating context to logs with Reference Tables and Observability Pipelines
6mDatadog Reference Tables and Observability Pipelines enable central enrichment of logs before routing to SIEM or data lake destinations.
Introducing ARFBench: A time series question-answering benchmark based on real incidents
7mARFBench is a time series question-answering benchmark built from real Datadog incidents to evaluate AI anomaly reasoning.
Test network paths with TCP, UDP, and ICMP in Datadog
5mDatadog supports TCP, UDP, and ICMP protocols in network path testing to diagnose application performance issues.
The product signal latency gap slowing your growth
6mThe post discusses latency between product signals in experiments and how prioritizing fixes can drive growth.
How to investigate cloud credential compromise with Bits AI Security Analyst
6mBits AI Security Analyst handles time-intensive steps in cloud credential compromise investigations, letting engineers focus on human judgment.
Turn developer feedback into operational insight with Datadog Forms and Sheets
4mDatadog Forms and Sheets collect structured developer feedback and analyze it alongside operational data.
Evaluate, optimize, and secure your Google Cloud AI stack with Datadog
6mDatadog helps Google Cloud teams evaluate AI agents, optimize GPU/TPU infrastructure, and strengthen security.
Bringing observability data hosting to the UK on AWS
4mDatadog's UK availability zone on AWS enables organizations to host observability data in the UK with end-to-end visibility.
Steganography at scale: Embedding share URLs in Datadog widget screenshots
8mDatadog embeds widget metadata into screenshots using invisible watermarks for self-describing visualizations at scale.
Identify and fix code issues faster with Datadog’s Azure DevOps Source Code integration
5mDatadog's Azure DevOps Source Code integration enables code health analysis, accelerated troubleshooting, and quality enforcement.
Centralize observability management with Datadog Governance Console
5mDatadog Governance Console centralizes usage insights and automates policy enforcement to reduce risk and control costs.
Every team should be A/B testing
5mThe post argues that A/B testing is valuable for a wide variety of engineering purposes beyond growth and product.
Spotting CI/CD misconfigurations before the bots do: Securing GitHub Actions with Datadog IaC Security
5mDatadog IaC Security catches GitHub Actions misconfigurations in the diff before they reach production.
Route OTel data from AI apps to ClickHouse and Datadog using Observability Pipelines
8mDatadog Observability Pipelines helps teams transform and normalize logs and metrics from OpenTelemetry for routing to ClickHouse and Datadog.
Manage service tracing across hosts with Single Step Instrumentation rules
6mSingle Step Instrumentation rules allow control over which services are traced by Datadog APM to reduce unnecessary trace data.
Detect runtime threats in Python Lambda functions with Datadog AAP
7mDatadog App and API Protection provides in-process security monitoring for Python AWS Lambda functions to detect application-level attacks.
Offline evaluation for AI agents: Best practices
9mBest practices for running offline evaluations to optimize AI agents in pre-production.
Introducing our open source AI-native SAST
7mDatadog's open source SAST solution uses AI to surface code vulnerabilities more accurately and efficiently.
Integrate Recorded Future threat intelligence with Datadog Cloud SIEM
6mThe Recorded Future integration enriches logs, ingests alerts, and prioritizes threats in Datadog Cloud SIEM with real-time intelligence.
Instrument and monitor Boomi integration flows with OpenTelemetry and Datadog
8mInstrument Boomi integration flows with OpenTelemetry and Datadog to collect and correlate process, JVM, and database telemetry.
Platform engineering metrics: What to measure and what to ignore
10mGuidance on which platform engineering metrics to collect and how to interpret them to quantify the platform's impact on software delivery.
Not all index scans are equal: How we cut query latency by over 99%
12mHow misaligned predicates and column order hurt index scan performance and how to detect this pattern using DBM to cut query latency by over 99%.
CI/CD security: How to secure your GitHub ecosystem
8mApplying a detection-based threat model to secure the GitHub ecosystem by identifying key inputs, identities, and associated risks.
Export & Share
Take the field notes with you