What is monitoring as code?

On this page

Monitoring as code is the practice of managing monitoring configurations and alerts through code. This approach offers several benefits for engineering teams at scale, including codified, version-controlled, and reusable monitoring configurations.

Daniel Giordano

Synthetic monitoring is a proven practice for ensuring reliability of applications and services in production. It excels at proactively discovering issues that affect real users, which is why companies—from startups to Fortune 500—rely on it to enhance their user experience. But teams that are moving fast or growing at scale can face real challenges with traditional synthetic monitoring workflows and tools:

Noisy alerts
Limited visibility into root causes
Siloed tools
Expensive and time-consuming scaling of monitoring

Monitoring as Code (MaC) addresses these pain points by automating and integrating monitoring directly into CI/CD pipelines, delivering efficiency, scalability, and actionable insights. It redefines monitoring as a dynamic, code-driven process that evolves seamlessly with your applications.

Let’s explore MaC, its benefits, and how you can use it to level up your monitoring.

What Is Monitoring as Code?

Monitoring as Code involves defining monitoring configurations and operations as code, enabling automated setups, real-time observability across DevOps lifecycles, and seamless integration with development workflows.

With Monitoring as Code, your monitoring is implemented in your CI/CD lifecycle and maintained in version control, without having to rummage around a UI. This approach ensures that monitoring evolves alongside your application, providing continuous feedback and maintaining stability.

Challenges of traditional synthetic monitoring

While synthetic monitoring is a tried and true software practice, there are a few challenges that come with the way tools have traditionally worked.

Noisy, fatiguing alerts

Traditional monitoring tools are difficult to configure alerts at a granular level, often over alerting a chain of teams, wasting time and precious resources. This leads to “alert fatigue,” where important alerts can be missed, and teams become desensitized to notifications.

Hard to scale in large systems

Synthetic monitors can be difficult to implement and maintain in large, complex systems. Organizations with many services and screens can spend hundreds of hours creating and reconfiguring monitors and alerts as their underlying software changes.

Siloed tools, metrics, and alerts

Operations teams have traditionally owned monitoring tools within organizations and often times are siloed from the engineering teams who ultimately need to fix the error.

Limited visibility into distributed systems

Traditional synthetic monitoring tools work at the surface level of an application, unable to give detailed explanations of the root cause of the alert.

Growing Costs

Whenever we discuss the ideal setup for synthetic monitoring, experienced DevOps and SRE engineers will respond: “Sure, that would be great, if we had the budget for it.” While it would be nice to simulate a complex user behavior every 30 seconds from 5 different geographic regions, the cost of most synthetic tools makes that cost-prohibitive.

Benefits of Monitoring as Code

Here are the most important benefits of adopting Monitoring as Code:

Codified & Programmable

Monitoring as Code transforms your monitoring setup into code that is stored, managed, and versioned alongside your application code. This approach brings significant advantages over traditional monitoring tools:

Advanced Programmability: The programmable nature of MaC allows you to achieve levels of automation and customization that are impossible in traditional UI-based tools. You can write scripts, create reusable templates, and use wrappers to dynamically generate and maintain monitors based on your application code. This approach simplifies the management of complex systems and ensures monitoring scales effortlessly with your application.
Integrated with CI/CD Pipelines: Monitoring configurations are seamlessly integrated into your CI/CD processes, deploying checks automatically as part of your application updates. This eliminates manual setup, ensures comprehensive coverage, and reduces the risk of human error.
Version Controlled: Every change to your monitoring configuration is tracked, enabling clear audits and easy rollbacks when needed. This ensures accountability and simplifies compliance, especially in highly regulated industries.

With MaC, your monitoring evolves alongside your codebase, enabling precision, flexibility, and scalability that traditional methods simply can’t match.

A Monitoring as Code example that tests a login flow of an ecommerce store — Monitoring a login flow with MaC

Reusability Enforces Consistency

Reusability in Monitoring as Code (MaC) goes beyond efficiency—it ensures consistent, reliable, and scalable monitoring across environments and teams. By reusing existing assets like Playwright tests and API definitions, MaC streamlines setup and eliminates redundancy.

Leverage Existing Code: Use Playwright tests and API definitions you’ve already written to create monitors. This eliminates the need to duplicate effort and ensures your monitoring is always in sync with your application’s logic and behavior.
Simplified Scaling: Reusable configurations allow you to replicate monitoring setups across multiple environments or projects without starting from scratch. This is especially valuable when scaling to new services or regions.
Faster Onboarding and Setup: Pre-tested modules and definitions reduce setup time for new services and simplify onboarding for engineers, allowing teams to deploy consistent, best-practice monitoring quickly.
Standardization Across Teams: Shared monitoring assets enforce best practices, ensuring all teams operate with the same standards and reducing discrepancies between environments.

By reusing existing assets and creating modular configurations, MaC ensures monitoring is efficient, consistent, and adaptable to changing needs.

Increased Collaboration

MaC breaks down silos by embedding monitoring into the development lifecycle, fostering collaboration across teams:

Unified Workflows: Developers and Ops teams collaborate on monitoring in the same repositories they use for application code. Everyone speaks the same language—code.
Cross-Team Visibility: Pull requests and reviews for monitoring setups promote transparency, enabling teams to align monitoring with application behavior and expectations.
Shared Ownership: Monitoring shifts from being a secondary concern to a shared responsibility. Developers, SREs, and DevOps engineers work together to ensure robust monitoring for applications.

By embedding collaboration into the monitoring process, MaC drives alignment, reduces friction, and creates a culture of shared accountability.

Monitoring as Code Use Cases

Continuous Monitoring Throughout the SDLC

To maintain high-quality deployments, monitoring should span every stage of the Software Development Life Cycle (SDLC)—from local development to staging and production. Monitoring as Code (MaC) integrates directly into CI/CD pipelines, enabling automatic and consistent observability across environments. This ensures that issues are detected and addressed before they reach production, significantly reducing downtime and errors.

For instance, commercetools, a leader in headless commerce, adopted MaC to implement continuous monitoring throughout their SDLC. By codifying their monitoring setup, they ensured every stage of their pipeline was equipped with accurate, automated checks. This approach caught issues early in development, reduced production incidents, and provided end-to-end visibility into their systems without adding extra operational overhead.

Automating Monitoring for Rapidly Changing Systems

Monitoring APIs and UIs in fast-paced environments is a complex challenge, especially when new endpoints are constantly being added or updated. Traditional approaches often involve manually setting up and maintaining checks, which is time-consuming and prone to errors. As systems scale, gaps in monitoring coverage or misconfigured alerts can lead to missed issues and delayed responses, impacting the reliability of your services. Monitoring as Code solves this by automating the creation and deployment of monitoring configurations, ensuring every API is consistently and comprehensively monitored from staging to production. By integrating monitoring into your development pipelines, you can keep up with rapid changes without sacrificing reliability.

For example, TrueLayer, a leading open banking platform, adopted Monitoring as Code to address these challenges. By codifying their API monitoring configurations and automating their setup with Checkly, they reduced manual effort by 60%. This allowed their team to scale monitoring as they introduced new endpoints, ensuring complete coverage and faster issue detection, all while focusing more on innovation rather than operational tasks.

Enhancing Monitoring Ownership Across Teams

In large organizations, monitoring is often centralized under SRE teams, creating bottlenecks and slowing down response times. This approach limits the ability of engineering teams to adapt and manage monitoring for their own services. Monitoring as Code (MaC) decentralizes this process, empowering teams to define and control their monitoring setups directly within their workflows. This fosters greater accountability and enables faster detection and resolution of issues.

LinkedIn leveraged MaC to transition from an SRE-managed model to one where engineering teams took full ownership of their monitoring. Using Checkly, they streamlined the process, allowing teams to tailor monitoring configurations to their specific needs. This shift reduced time to detection, improved system reliability, and gave teams the autonomy to innovate without relying on a central authority.

What Can You Monitor Using Monitoring as Code?

Monitoring as Code (MaC) enables you to test key aspects of your application and infrastructure with precision and automation. Here are some of the areas where it excels:

Application Uptime

Application uptime reflects how reliably your application is available to users, and even a few minutes of downtime can lead to user frustration, loss of revenue, and damage to your reputation.

MaC allows you to define uptime checks programmatically, ensuring these checks are consistent across all environments. For example, you can configure MaC to ping specific endpoints or services at regular intervals to verify they are reachable and responding within acceptable timeframes. If a failure is detected, MaC immediately triggers alerts, allowing teams to investigate and resolve issues before they escalate.

MaC also simplifies scaling uptime monitoring across multiple regions. By deploying checks from different geographic locations, you can ensure your application is accessible worldwide and identify regional connectivity issues. Additionally, codified monitoring enables you to keep pace with application changes—new services and endpoints are automatically incorporated into your monitoring setup, reducing the risk of missed coverage.

UI Functionality

MaC allows you to simulate and test user interactions with your web application. For example, you can create browser checks to validate critical workflows like logging in, completing a checkout, or performing a search.

These automated tests ensure that every step—loading pages, submitting forms, and rendering elements—works as expected. They catch issues early, safeguarding functionality and user satisfaction.

With MaC, browser tests are version-controlled and evolve with your application. You can easily scale monitoring across regions, browsers, and devices, ensuring a seamless experience for all users.

Here’s an example of a browser monitor that logs into Checkly, and waits for the dashboard to fully load.

import { expect, test } from '@playwright/test' // 1

test('Login to Checkly', async ({ page }) => { // 2
  await page.goto('https://app.checklyhq.com/login') // 3

  await page.locator('input[type="email"]').type('john@example.com') // 4
  await page.locator('input[type="password"]').type('mypassword') // 4
  await page.getByRole('button', { name: 'Log In' }).click() // 5

  const homeDashboardTable = page.getByTestId('home-dashboard-table')
  await expect(homeDashboardTable).toBeVisible() // 6
})

API Health

APIs power modern applications, and MaC ensures their availability, performance, and correctness. You can automate checks to validate response status codes, measure latency, and verify payloads.

These checks provide confidence that your APIs function reliably under different conditions. By deploying API checks globally, you can ensure consistent performance for users across all regions.

Since monitoring setups are code-based, they are easy to update as your APIs evolve. This keeps monitoring aligned with API changes, eliminating gaps in coverage.

Here’s an example of an API check that tests an e-commerce flow of updating an inventory item:


import { ApiCheck, Frequency, AssertionBuilder, RetryStrategyBuilder } from 'checkly/constructs'

new ApiCheck('update-inventory-item', {
  name: 'update inventory item',
  activated: true,
  muted: false,
  shouldFail: false,
  runParallel: false,
  locations: ['eu-central-1', 'us-west-1'],
  tags: [],
  frequency: Frequency.EVERY_10M,
  environmentVariables: [],
  // group: your check belongs to group 'Inventory API',
  maxResponseTime: 10000,
  degradedResponseTime: 5000,
  request: {
    url: 'https://danube-webshop.herokuapp.com/api/books',
    method: 'GET',
    followRedirects: true,
    skipSSL: false,
    assertions: [
      AssertionBuilder.jsonBody('$.length').equals(30),
      AssertionBuilder.statusCode().equals(200),
    ],
    body: ``,
    bodyType: 'NONE',
    headers: [],
    queryParameters: [],
  },
  retryStrategy: RetryStrategyBuilder.fixedStrategy({
    baseBackoffSeconds: 0,
    maxRetries: 1,
    maxDurationSeconds: 600,
    sameRegion: false,
  }),
})

Application Performance

Slow app load times or high latency can drive users away and impact revenue. Monitoring as Code provides a streamlined way to track performance metrics like page load times, API response times, and latency, ensuring your application consistently meets expectations.

With MaC, performance checks can be integrated into your CI/CD pipelines to catch slowdowns early in development or staging. Automated alerts notify your team of performance issues in real time, enabling proactive fixes before users are impacted.

MaC also excels at monitoring deployments by running performance checks automatically after each release. This ensures that new changes don’t introduce regressions, and if issues arise, teams can act quickly to resolve or roll back, maintaining stability and trust.

MaC makes it easy to monitor performance across regions, simulating user experiences in different locations. This helps identify latency caused by network issues, server configurations, or CDNs, ensuring fast, consistent performance worldwide. By codifying performance monitoring, MaC standardizes and automates the process, allowing your team to maintain a responsive, reliable web app at scale.

Scheduled Jobs or Cron Jobs

Scheduled jobs or cron jobs are crucial for automating tasks, but failures can lead to major problems if undetected. MaC enables heartbeat monitoring to ensure these jobs run on schedule.

A heartbeat check verifies that a job reports back within the expected timeframe. If it doesn’t, an alert is triggered, giving you a chance to fix the issue before it impacts your system.

This proactive approach ensures your critical automation workflows are reliable and resilient.

Infrastructure Reliability

MaC also tests the health and reliability of your infrastructure. This includes uptime monitoring for servers, latency checks for databases, and validating metrics like CPU or memory usage.

By codifying these checks, you create an automated system that detects potential issues early. This helps you resolve problems before they escalate into critical incidents.

Monitoring as Code with Checkly

Monitoring as Code (MaC) with Checkly empowers you to streamline, automate, and scale your monitoring strategy by leveraging advanced tools and integrations. With native Playwright support, Terraform and Pulumi integrations, and seamless CI/CD pipeline compatibility, Checkly simplifies complex monitoring setups and enables dynamic, code-driven observability.

CI/CD Pipeline Integration

Checkly’s compatibility with CI/CD pipelines ensures monitoring is embedded into your development workflow. You can configure performance, uptime, and functional checks to run automatically during deployments, catching issues early and maintaining system reliability. This tight integration helps teams:

Validate changes before they impact production.
Monitor deployments in real-time to detect regressions or failures.
Streamline troubleshooting by aligning monitoring data with code changes.

Native Playwright Support

With native Playwright support, Checkly enables you to directly use existing Playwright scripts as browser checks. This integration allows you to monitor complex user workflows—such as logins, form submissions, or multi-step transactions—without needing to rewrite tests.

Your monitoring aligns with your application logic, offering precise, actionable insights into the user experience.

Alert Management via Code

A strong alerting strategy ensures critical issues are addressed promptly while minimizing noise for your team. With Checkly, you can configure alerting integrations like PagerDuty or OpsGenie directly in your codebase. This approach allows you to:

Define alert thresholds, escalation paths, and severity levels as part of your monitoring setup.
Automatically deploy and update alerting configurations alongside your application changes.
Maintain consistency across environments, ensuring no alerts are missed or misconfigured during rollouts.

Dashboards and Visualization

Dashboards offer a real-time view of your application’s health, and MaC enables you to configure these visualizations programmatically. Tools like Grafana integrate easily with MaC, allowing you to:

Automate the creation and updates of dashboards based on predefined templates.
Highlight key metrics like latency, error rates, or resource utilization, making it easier to identify trends and anomalies.
Share consistent, reliable dashboards across teams or environments.

Performance Benchmarking

Performance benchmarking helps you understand your application’s limits and track improvements over time. MaC allows you to automate performance tests and benchmark configurations, ensuring consistent measurement across deployments. You can:

Define baseline performance metrics in your codebase.
Run automated checks during staging or production deployments to compare against benchmarks.
Continuously refine performance goals based on historical data.

Codified benchmarks keep your team aligned and make it easier to identify potential bottlenecks before they impact users.

Incident Postmortems and Continuous Improvement

Postmortems are crucial for learning from incidents, and MaC ensures these lessons are integrated directly into your monitoring setup. After identifying root causes, you can:

Update monitoring checks and alert thresholds via code to prevent similar issues.
Document findings in version-controlled repositories, keeping configurations transparent and actionable.
Automate the deployment of improvements across all environments.

This approach turns every incident into an opportunity for growth, building resilience into your application with each iteration.

Regional or Multi-Environment Testing

Modern applications often run across multiple regions or environments, and ensuring consistent performance everywhere is a major challenge. MaC allows you to:

Configure checks that simulate user interactions or API calls from different regions to identify latency or availability issues.
Automate regional monitoring setups, ensuring every environment, from staging to production, is monitored uniformly.
Adjust alert thresholds and benchmarks for specific regions or conditions, such as high traffic during local peak hours.

This granular, automated approach ensures your application delivers a seamless experience to users regardless of their location or environment.

Infrastructure as Code (IaC) Support

Checkly integrates seamlessly with popular IaC tools like Terraform and Pulumi. This means you can define monitoring configurations alongside your infrastructure code, ensuring monitoring setups are automatically deployed with your infrastructure updates. These integrations enable:

Consistent monitoring deployment across all environments.
Simplified updates to monitors as your infrastructure evolves.
Centralized version control, ensuring your monitoring is always aligned with your application and infrastructure.

Best Practices for Monitoring as Code

Implementing monitoring as code effectively requires following best practices to ensure reliability, scalability, and maintainability.

Modular Configurations: Break down monitoring configurations into reusable modules. This makes it easier to manage and update configurations.
Environment-Specific Settings: Use environment-specific settings to tailor monitoring configurations based on the environment. This ensures that monitoring is relevant and accurate.
Automation and Testing: Automate the deployment of monitoring configurations and regularly test them to catch issues early.
Documentation: Maintain comprehensive documentation for all monitoring configurations and processes. This aids in onboarding new team members and maintaining consistency.
Continuous Improvement: Regularly review and update monitoring configurations to adapt to changes in the application and infrastructure.

Why Checkly for Monitoring as Code?

By combining native Playwright support, IaC integrations like Terraform and Pulumi, and seamless CI/CD pipeline compatibility, Checkly offers a robust, flexible, and scalable solution for Monitoring as Code. Whether it’s managing alerts, benchmarking performance, or monitoring deployments, Checkly ensures your monitoring evolves dynamically with your application, delivering reliability and efficiency at every step.

Start streamlining your monitoring strategy today with Checkly and experience the power of Monitoring as Code firsthand.