Overview

Datadog is an observability platform that integrates and automates infrastructure monitoring, application performance monitoring (APM), log management, and security monitoring into a unified service. Founded in 2010, Datadog provides capabilities for collecting, processing, and visualizing data from various sources across cloud and on-premises environments Datadog homepage. The platform is designed to offer real-time insights into system performance and health, supporting teams in identifying bottlenecks, debugging issues, and understanding user experience.

The platform's architecture is built to ingest diverse data types, including metrics, traces, and logs, from a wide array of technologies and services. This data is then correlated and presented through customizable dashboards, alerts, and machine learning-driven anomaly detection. Datadog's approach to observability aims to reduce tool sprawl by consolidating monitoring efforts into a single pane of glass, which can streamline operational workflows for developers, operations teams, and security analysts.

Datadog is commonly used by organizations with complex, distributed systems, particularly those leveraging cloud-native architectures, microservices, and containers. Its comprehensive feature set and extensive integration ecosystem make it suitable for end-to-end cloud monitoring, from infrastructure health to individual application requests and user interactions. The platform also extends to security monitoring, providing capabilities to detect threats, analyze security events, and ensure compliance across the IT estate. For instance, its security monitoring features can help identify suspicious activity across logs and metrics, correlating data points that might otherwise go unnoticed in disparate systems.

The platform's developer experience is supported by extensive documentation and a wide range of SDKs for programming languages such as Python, Go, and Node.js Datadog documentation. This allows developers to integrate Datadog agents and libraries into their applications for custom metric submission, event posting, and trace collection. The API reference provides detailed information for programmatic interaction, enabling automation of dashboard management and alert configuration. Datadog's unified console is designed to offer a consolidated view of metrics, logs, and traces, which assists in debugging and performance analysis across the entire application stack.

Key features

  • Infrastructure Monitoring: Collects and visualizes metrics from servers, containers, databases, and cloud services, offering real-time visibility into resource utilization and performance.
  • Log Management: Centralizes logs from all sources, allowing for aggregation, parsing, searching, and analysis to troubleshoot issues and gain operational insights.
  • Application Performance Monitoring (APM): Traces requests across distributed systems, providing detailed insights into latency, errors, and throughput for individual services and transactions.
  • Synthetic Monitoring: Simulates user interactions and API calls from global locations to proactively detect performance issues and outages before they impact real users.
  • Real User Monitoring (RUM): Captures and analyzes front-end performance metrics and user behavior directly from web browsers and mobile applications, providing insights into actual user experience.
  • Security Monitoring: Detects threats and suspicious activity across infrastructure, applications, and logs, correlating security signals for faster incident response and compliance adherence.
  • Cloud Cost Management: Provides visibility and analysis of cloud spending across different providers, helping organizations optimize cloud resource utilization and control costs.
  • Network Performance Monitoring: Monitors network traffic and connectivity between services and hosts, identifying network-related performance bottlenecks.
  • Developer Tools: Offers SDKs for multiple languages, a robust API, and CLI tools for integration and automation, enabling developers to incorporate observability into their development workflows.

Pricing

Datadog employs a usage-based pricing model, with costs varying based on the specific product used and the volume of data or number of units monitored. The pricing structure typically involves per-host, per-GB, or per-unit charges, with different tiers available for various products.

Product Starting Price Unit Notes
Infrastructure Monitoring $15 per host/month Includes metrics, events, and dashboards.
Log Management $0.10 per GB ingested/month Tiered pricing based on volume.
APM $31 per host/month Includes distributed tracing and profiling.
Synthetic Monitoring $5 per 10k tests/month Browser tests, API tests.
Real User Monitoring (RUM) $4.50 per 10k sessions/month Web and mobile RUM.

As of 2026-05-07, Datadog offers a free tier that includes monitoring for up to 5 hosts or 150GB of logs per month. Detailed pricing information and calculators are available on the official Datadog pricing page.

Common integrations

  • AWS: Integrates with Amazon Web Services to collect metrics, logs, and events from EC2, Lambda, S3, RDS, and other AWS services Datadog AWS integration.
  • Azure: Connects with Microsoft Azure to monitor virtual machines, Azure Functions, Azure SQL Database, and other Azure resources Datadog Azure integration.
  • Google Cloud Platform (GCP): Gathers data from Google Compute Engine, Google Kubernetes Engine, Cloud Functions, and other GCP services Datadog GCP integration.
  • Kubernetes: Monitors Kubernetes clusters, nodes, pods, and deployments, providing visibility into containerized applications and orchestration Datadog Kubernetes integration.
  • Docker: Collects metrics and logs from Docker containers and hosts, aiding in the monitoring of containerized environments Datadog Docker integration.
  • Prometheus: Can ingest metrics from Prometheus exporters, allowing for centralized visualization alongside other Datadog data Datadog Prometheus integration.
  • Slack: Integrates with Slack for alert notifications and incident management workflows Datadog Slack integration.
  • Jira: Creates and updates Jira issues directly from Datadog alerts, streamlining incident resolution Datadog Jira integration.

Alternatives

  • New Relic: Offers a full-stack observability platform with APM, infrastructure monitoring, and log management, often cited for its user experience and comprehensive features.
  • Splunk: Specializes in log management and security information and event management (SIEM), with capabilities extending to operational intelligence and APM through acquisitions.
  • Grafana Labs: Provides open-source and commercial solutions for observability, including Grafana for visualization, Loki for logs, and Prometheus for metrics, often favored for its flexibility and community support. Gartner has recognized Grafana in its Magic Quadrant for APM and Observability Gartner Magic Quadrant.

Getting started

To begin sending custom metrics to Datadog using the Python SDK, you typically install the datadog library and configure a DogStatsD client. DogStatsD is a Datadog-specific implementation of the StatsD protocol, used for sending custom metrics, events, and service checks.

First, install the Datadog Python library:

pip install datadog

Next, configure your Datadog API and application keys, and then send a custom metric. Ensure the Datadog Agent is running on your host to receive these metrics.

from datadog import initialize, statsd

# Initialize Datadog API client (optional, for API interactions like event posting)
# For sending metrics via DogStatsD, only the statsd client is strictly necessary.
# Replace with your actual API and Application Keys
options = {
    'api_key': 'YOUR_DATADOG_API_KEY',
    'app_key': 'YOUR_DATADOG_APP_KEY'
}
initialize(**options)

# Configure DogStatsD client (defaults to localhost:8125)
# Ensure the Datadog Agent is running and configured to listen on this port.
# If the agent is on a different host/port, specify it here.
statsd.host = 'localhost'
statsd.port = 8125

# Send a custom gauge metric
# A gauge represents a single data point that can go up or down.
statsd.gauge('my_app.users.online', 123, tags=['env:production', 'region:us-east-1'])
print("Sent gauge metric 'my_app.users.online' with value 123")

# Send a custom counter metric
# A counter increments a value over time.
statsd.increment('my_app.page_views', tags=['page:homepage'])
print("Sent increment metric 'my_app.page_views'")

# Send a custom histogram metric
# Histograms track the distribution of values.
statsd.histogram('my_app.request_duration', 0.52, tags=['endpoint:/api/data'])
print("Sent histogram metric 'my_app.request_duration' with value 0.52")

# Send a custom event (requires API key initialization)
statsd.event(
    'Application Deployment', # Title of the event
    'A new version of my_app was deployed to production.', # Text of the event
    alert_type='info', # 'info', 'warning', 'error', 'success'
    tags=['deployment', 'version:1.2.0']
)
print("Sent event 'Application Deployment'")

This example demonstrates sending various metric types and an event to Datadog. The Datadog Agent, running on your host, collects these metrics from the DogStatsD port and forwards them to the Datadog platform, where they can be visualized and used for alerting. For more complex setups, refer to the Datadog DogStatsD documentation.