At present, we’re asserting the general public preview of AWS DevOps Agent, a frontier agent that helps you reply to incidents, establish root causes, and forestall future points by way of systematic evaluation of previous incidents and operational patterns.
Frontier brokers characterize a brand new class of AI brokers which might be autonomous, massively scalable, and work for hours or days with out fixed intervention.
When manufacturing incidents happen, on-call engineers face vital strain to rapidly establish root causes whereas managing stakeholder communications. They need to analyze information throughout a number of monitoring instruments, assessment current deployments, and coordinate response groups. After service restoration, groups usually lack bandwidth to rework incident learnings into systematic enhancements.
AWS DevOps Agent is your always-on, autonomous on-call engineer. When points come up, it robotically correlates information throughout your operational toolchain, from metrics and logs to current code deployments in GitHub or GitLab. It identifies possible root causes and recommends focused mitigations, serving to cut back imply time to decision. The agent additionally manages incident coordination, utilizing Slack channels for stakeholder updates and sustaining detailed investigation timelines.
To get began, you join AWS DevOps Agent to your present instruments by way of the AWS Administration Console. The agent works with widespread providers equivalent to Amazon CloudWatch, Datadog, Dynatrace, New Relic, and Splunk for observability information, whereas integrating with GitHub Actions and GitLab CI/CD to trace deployments and their impression in your cloud assets. By the carry your personal (BYO) Mannequin Context Protocol (MCP) server functionality, you may also combine further instruments equivalent to your group’s customized instruments, specialised platforms or open supply observability options, equivalent to Grafana and Prometheus into your investigations.
The agent acts as a digital staff member and will be configured to robotically reply to incidents out of your ticketing programs. It consists of built-in help for ServiceNow, and thru configurable webhooks, can reply to occasions from different incident administration instruments like PagerDuty. As investigations progress, the agent updates tickets and related Slack channels with its findings. All of that is powered by an clever utility topology the agent builds—a complete map of your system parts and their interactions, together with deployment historical past that helps establish potential deployment-related causes throughout investigations.
Let me present you the way it works
To indicate you the way it works, I deployed a straigthforward AWS Lambda operate that deliberately generates errors when invoked. I deployed it in an AWS CloudFormation stack.
Step 1: Create an Agent House
An Agent House defines the scope of what AWS DevOps Agent can entry because it performs duties.
You possibly can arrange Agent Areas based mostly in your operational mannequin. Some groups align an Agent House with a single utility, others create one per on-call staff managing a number of providers, and a few organizations use a centralized method. For this demonstration, I’ll present you the way to create an Agent House for a single utility. This setup helps isolate investigations and assets for that particular utility, making it simpler to trace and analyze incidents inside its context.
Within the AWS DevOps Agent part of the AWS Administration Console, I choose Create Agent House, enter a reputation for this area and create the AWS Id and Entry Administration (IAM) roles it makes use of to introspect AWS assets in my or others’ AWS accounts.
For this demo, I select to allow the AWS DevOps Agent net app; extra about this later. This may be finished at a later stage.
When prepared, I select Create.
After it has been created, I select the Topology tab.
This view exhibits the important thing assets, entities, and relationships AWS DevOps Agent has chosen as a basis for performing its duties effectively. It doesn’t characterize every little thing AWS DevOps Agent can entry or see, solely what the Agent considers most related proper now. By default, the Topology consists of the AWS assets which might be contained in my account. As your agent completes extra duties, it is going to uncover and add new assets to this checklist.
Step 2: Configure the AWS DevOps net app for the operators
The AWS DevOps Agent net app offers an internet interface for on-call engineers to manually set off investigations, view investigation particulars together with related topology parts, steer investigations, and ask questions on an investigation.
I can entry the online app straight from my Agent House within the AWS console by selecting the Operator entry hyperlink. Alternatively, I can use AWS IAM Id Heart to configure person entry for my staff. IAM Id Heart lets me handle customers and teams straight or connect with an identification supplier (IdP), offering a centralized method to management who can entry the AWS DevOps Agent net app.
At this stage, I’ve an Agent House all set as much as focus investigations and assets for this specific utility, and I’ve enabled the DevOps staff to provoke investigations utilizing the online app.
Now that the one-time setup for this utility is finished, I begin invoking the defective Lambda operate. It generates errors at every invocation. The CloudWatch alarm related to the Lambda errors rely activates to ALARM state. In actual life, you would possibly obtain an alert from exterior providers, equivalent to ServiceNow. You possibly can configure AWS DevOps Agent to robotically begin investigations when receiving such alerts.
For this demo, I manually begin the investigation by choosing Begin Investigation.
You too can select from a number of preconfigured beginning factors to rapidly start your investigation: Newest alarm to research your most up-to-date triggered alarm and analyze the underlying metrics and logs to find out the basis trigger, Excessive CPU utilization to research excessive CPU utilization metrics throughout your compute assets and establish which processes or providers are consuming extreme assets, or Error fee spike to research the current enhance in utility error charges by analyzing metrics, utility logs, and figuring out the supply of failures.
I enter some data, equivalent to Investigation particulars, Investigation place to begin, the Date and time of the incident, the AWS Account ID for the incident.
Within the AWS DevOps Agent net app, you may watch the investigation unfold in actual time. The agent identifies the applying stack. It correlates metrics from CloudWatch, examines logs from CloudWatch Logs or exterior sources, equivalent to Splunk, opinions current code adjustments from GitHub, and analyzes traces from AWS X-Ray.
It identifies the error patterns and offers an in depth investigation abstract. Within the context of this demo, the investigation reveals that these are intentional take a look at exceptions, exhibits the timeline of operate invocations resulting in the alarm, and even suggests monitoring enhancements for error dealing with.
The agent makes use of a devoted incident channel in Slack, notifies on-call groups if wanted, and offers real-time standing updates to stakeholders. By the investigation chat interface, you may work together straight with the agent by asking clarifying questions equivalent to “which logs did you analyze?” or steering the investigation by offering further context, equivalent to “give attention to these particular log teams and rerun your evaluation.” In the event you want skilled help, you may create an AWS Help case with a single click on, robotically populating it with the agent’s findings, and have interaction with AWS Help specialists straight by way of the investigation chat window.
For this demo, the AWS DevOps Agent accurately recognized handbook actions within the Lambda console to invoke a operate that deliberately triggers errors 😇.
Past incident response, AWS DevOps Agent analyzes my current incidents to establish high-impact enhancements that stop future points.
Throughout energetic incidents, the agent provides speedy mitigation plans by way of its incident mitigations tab to assist restore service rapidly. Mitigation plans encompass specs that present detailed implementation steering for builders and agentic growth instruments like Kiro.
For longer-term resilience, it identifies potential enhancements by analyzing gaps in observability, infrastructure configurations, and deployment pipeline. My easy demo that triggered intentional errors was not sufficient to generate related suggestions although.
For instance, it’d detect {that a} crucial service lacks multi-AZ deployment and complete monitoring. The agent then creates detailed suggestions with implementation steering, contemplating elements like operational impression and implementation complexity. In an upcoming fast follow-up launch, the agent will develop its evaluation to incorporate code bugs and testing protection enhancements.
Availability
You possibly can attempt AWS DevOps Agent at this time within the US East (N. Virginia) Area. Though the agent itself runs in US East (N. Virginia) (us-east-1), it may monitor purposes deployed in any Area, throughout a number of AWS accounts.
In the course of the preview interval, you should use AWS DevOps Agent at no cost, however there will probably be a restrict on the variety of agent process hours per thirty days.
As somebody who has spent numerous nights debugging manufacturing points, I’m significantly enthusiastic about how AWS DevOps Agent combines deep operational insights with sensible, actionable suggestions. The service helps groups transfer from reactive firefighting to proactive system enchancment.
To be taught extra and join the preview, go to AWS DevOps Agent. I sit up for listening to how AWS DevOps Agent helps enhance your operational effectivity.







