Construct AI Brokers Price Preserving: The Canvas Framework
Why 95% of enterprise AI agent tasks fail
Growth groups throughout enterprises are caught in the identical cycle: They begin with “Let’s strive LangChain” earlier than determining what agent to construct. They discover CrewAI with out defining the use case. They implement RAG earlier than figuring out what information the agent truly wants. Months later, they’ve a powerful technical demo showcasing multi-agent orchestration and power calling—however cannot articulate ROI or clarify the way it solves precise enterprise wants.
In response to McKinsey’s newest analysis, whereas practically eight in 10 corporations report utilizing generative AI,
fewer than 10% of use instances deployed ever make it previous the pilot stage
. MIT researchers finding out this problem recognized a ”
gen AI divide
“—a spot between organizations efficiently deploying AI and people caught in perpetual pilots. Of their pattern of 52 organizations, researchers discovered patterns suggesting failure charges as excessive as 95% (pg.3). Whether or not the true failure price is 50% or 95%, the sample is obvious: Organizations lack clear beginning factors, initiatives stall after pilot phases, and most customized enterprise instruments fail to succeed in manufacturing.
6 crucial failures killing your AI agent tasks
The hole between agentic AI’s promise and its actuality is stark. Understanding these failure patterns is step one towards constructing techniques that truly work.
1. The technology-first lure
MIT’s analysis discovered that whereas 60% of organizations evaluated enterprise AI instruments,
solely 5% reached manufacturing
(pg.6)—a transparent signal that companies battle to maneuver from exploration to execution. Groups rush to implement frameworks earlier than defining enterprise issues. Whereas most organizations have moved past advert hoc approaches (
down from 19% to six%
, in accordance with IBM), they’ve changed chaos with structured complexity that also misses the mark.
In the meantime, one in 4 corporations taking a real “AI-first” method—beginning with enterprise issues moderately than technical capabilities—report transformative outcomes. The distinction has much less to do with technical sophistication and extra about strategic readability.
2. The aptitude actuality hole
Carnegie Mellon’s TheAgentCompany benchmark uncovered the uncomfortable fact:
Even our greatest AI brokers would make horrible staff
. One of the best AI mannequin (Claude 3.5 Sonnet)
completes solely 24% of workplace duties
, with
34.4% success when given partial credit score
. Brokers battle with primary obstacles, reminiscent of pop-up home windows, which people navigate instinctively.
Extra regarding, when confronted with challenges,
some brokers resort to deception
, like renaming current customers as an alternative of admitting they cannot discover the precise individual. These points reveal elementary reasoning gaps that make autonomous deployment harmful in actual enterprise environments, moderately than simply technical limitations.
3. Management vacuum
The disconnect is obvious:
Fewer than 30% of corporations
report CEO sponsorship of the AI agenda regardless of
70% of executives saying agentic AI is essential to their future
. This management vacuum creates cascading failures—AI initiatives fragment into departmental experiments, lack authority to drive organizational change, and may’t break by silos to entry needed sources.
Distinction this with Moderna,
the place CEO buy-in
drove the
deployment of 750+ AI brokers
and radical restructuring of HR and IT departments. As with the early waves of Massive Information, information science, then machine studying adoption, management buy-in is the deciding issue for the survival of generative AI initiatives.
4. Safety and governance limitations
Organizations are paralyzed by a governance paradox: 92% imagine governance is crucial,
however solely 44% have insurance policies
(SailPoint, 2025). The result’s predictable—80% skilled AI appearing exterior meant boundaries, with prime issues together with privileged information entry (60%), unintended actions (58%), and sharing privileged information (57%). With out clear moral tips, audit trails, and compliance frameworks, even profitable pilots cannot transfer to manufacturing.
5. Infrastructure chaos
The infrastructure hole creates a domino impact of failures. Whereas
82% of organizations
already use AI brokers,
49% cite information issues
as major adoption limitations (IBM). Information stays fragmented throughout techniques, making it unattainable to supply brokers with full context.
Groups find yourself managing a number of databases—one for operational information, one other for vector information and workloads, a 3rd for dialog reminiscence—every with completely different APIs and scaling traits. This complexity kills momentum earlier than brokers can truly show worth.
6. The ROI mirage
The optimism-reality hole is staggering.
Practically 80% of corporations report no materials earnings affect
from gen AI (McKinsey), whereas
62% count on 100%+ ROI from deployment
(PagerDuty). Firms measure exercise (variety of brokers deployed) moderately than outcomes (enterprise worth created). With out clear success metrics outlined upfront, even profitable implementations appear like costly experiments.
The AI improvement paradigm shift: from data-first to product-first
There’s been a elementary shift in how profitable groups method agentic AI improvement, and it mirrors what
Shawn Wang (Swyx)
noticed in his influential ”
Rise of the AI Engineer
” submit concerning the broader generative AI house.
The outdated approach: information → mannequin → product
Within the conventional paradigm practiced through the early years of machine studying, groups would spend months architecting datasets, labeling coaching information, and getting ready for mannequin pre-training. Solely after coaching customized fashions from scratch may they lastly incorporate these into product options.
The trade-offs have been extreme: large upfront funding, lengthy improvement cycles, excessive computational prices, and brittle fashions with slender capabilities. This sequential course of created excessive limitations to entry—solely organizations with substantial ML experience and sources may deploy AI options.
Determine 1.
The Information → Mannequin → Product Lifecycle.
Conventional AI improvement required months of knowledge preparation and mannequin coaching earlier than delivery merchandise.
The brand new approach: product → information → mannequin
The emergence of basis fashions modified every little thing.
Determine 2.
The Product → Information → Mannequin Lifecycle.
Basis mannequin APIs flipped the standard cycle, enabling speedy experimentation earlier than information and mannequin optimization.
Highly effective LLMs grew to become commoditized by suppliers like OpenAI and Anthropic. Now, groups may:
Begin with the product imaginative and prescient and buyer want.
Determine what information would improve it (examples, information bases, RAG content material).
Choose the suitable mannequin that might course of that information successfully.
This enabled zero-shot and few-shot capabilities by way of easy API calls. Groups may construct MVPs in days, outline their information necessities based mostly on precise use instances, then choose and swap fashions based mostly on efficiency wants. Builders now ship experiments rapidly, collect insights to enhance information (for RAG and analysis), then fine-tune solely when needed. This democratized cutting-edge AI to all builders, not simply these with specialised ML backgrounds.
The agentic evolution: product → agent → information → mannequin
However for agentic techniques, there’s an much more essential perception: Agent design sits between product and information.
Determine 3.
The Product → Agent → Information → Mannequin Lifecycle.
Agent design now sits between product and information, figuring out downstream necessities for information, instruments, and mannequin choice.
Now, groups comply with this development:
Product:
Outline the consumer downside and success metrics.
Agent:
Design agent capabilities, workflows, and behaviors.
Information:
Decide what information, examples, and context the agent wants.
Mannequin:
Choose exterior suppliers and optimize prompts in your information.
With exterior mannequin suppliers, the “mannequin” part is actually about choice and integration moderately than deployment. Groups select which supplier’s fashions greatest deal with their information and use case, then construct the orchestration layer to handle API calls, deal with failures, and optimize prices.
The agent layer shapes every little thing downstream—figuring out what information is required (information bases, examples, suggestions loops), what instruments are required (search, calculation, code execution), and finally, which exterior fashions can execute the design successfully.
This evolution means groups can begin with a transparent consumer downside, design an agent to resolve it, establish needed information, after which choose acceptable fashions—moderately than beginning with information and hoping to discover a use case. For this reason the canvas framework follows this actual movement.
The canvas framework: A scientific method to constructing AI brokers
Somewhat than leaping straight into technical implementation, profitable groups use structured planning frameworks. Consider them as “enterprise mannequin canvases for AI brokers”—instruments that assist groups suppose by crucial selections in the precise order.
Two complementary frameworks straight tackle the widespread failure patterns:
Determine 4.
The Agentic AI Canvas Framework.
A structured five-phase method transferring from enterprise downside definition by POC, prototype, manufacturing canvas, and manufacturing agent deployment. Please see the “Sources” part on the finish for hyperlinks to the corresponding templates, hosted within the gen AI Showcase.
Canvas #1 – The POC canvas for validating your agent concept
The POC canvas implements the product → agent → information → mannequin movement by eight targeted squares designed for speedy validation:
Determine 5.
The Agent POC Canvas V1.
Eight targeted squares implementing the product → agent → information → mannequin movement for speedy validation of AI agent ideas.
Section 1: Product validation—who wants this and why?
Earlier than constructing something, you need to validate that an actual downside exists and that customers truly need an AI agent resolution. This part prevents the widespread mistake of constructing spectacular know-how that no one wants. If you cannot clearly articulate who will use this and why they’re going to favor it to present strategies, cease right here.
desk,
th,
td {
border: 1px strong black;
border-collapse: collapse;
}
th,
td {
padding: 5px;
}
Sq.
Objective
Key Questions
Product imaginative and prescient & consumer downside
Outline the enterprise downside and set up why an agent is the precise resolution.
Core downside:
What particular workflow frustrates customers in the present day?
Goal customers:
Who experiences this ache and the way usually?
Success imaginative and prescient:
What would success appear like for customers?
Worth speculation:
Why would customers favor an agent to present options?
Consumer validation & interplay
Consumer Validation & Interplay
Map how customers will interact with the agent and establish adoption limitations.
Consumer journey:
What is the full interplay from begin to end?
Interface choice:
How do customers need to work together?
Suggestions mechanisms:
How will you recognize it is working?
Adoption limitations:
What may stop customers from making an attempt it?
Section 2: Agent design—what is going to it do and the way?
With a validated downside, design the agent’s capabilities and conduct to resolve that particular want. This part defines the agent’s boundaries, decision-making logic, and interplay model earlier than any technical implementation. The agent design straight determines what information and fashions you may want, making this the crucial bridge between downside and resolution.
desk,
th,
td {
border: 1px strong black;
border-collapse: collapse;
}
th,
td {
padding: 5px;
}
Sq.
Objective
Key Questions
Agent capabilities & workflow
Agent Capabilities & Workflow
Design what the agent should do to resolve the recognized downside.
Core duties:
What particular actions should the agent carry out?
Choice logic:
How ought to advanced requests be damaged down?
Software necessities:
What capabilities does the agent want?
Autonomy boundaries:
What can it determine versus escalate?
Agent interplay & reminiscence
Agent Interplay & Reminiscence
Set up communication model and context administration.
Dialog movement:
How ought to the agent information interactions?
Character and tone:
What model suits the use case?
Reminiscence necessities:
What context should persist?
Error dealing with:
How ought to confusion be managed?
Section 3: Information necessities—what information does it want?
Brokers are solely nearly as good as their information base, so establish precisely what info the agent wants to finish its duties. This part maps current information sources and gaps earlier than choosing fashions, making certain you do not select know-how that may’t deal with your information actuality. Understanding information necessities upfront prevents the pricey mistake of choosing fashions that may’t work along with your precise info.
desk,
th,
td {
border: 1px strong black;
border-collapse: collapse;
}
th,
td {
padding: 5px;
}
Sq.
Objective
Key Questions
Information necessities & sources
Determine important info and the place to seek out it.
Important information:
What info should the agent have to finish duties?
Information sources:
The place does this data at present exist?
Replace frequency:
How usually does this info change?
High quality necessities:
What accuracy stage is required?
Information assortment & enhancement technique
Plan information gathering and steady enchancment.
Assortment technique:
How will preliminary information be gathered?
Enhancement precedence:
What information has the largest affect?
Suggestions loops:
How will interactions enhance the info?
Integration methodology:
How will information be ingested and up to date?
Section 4: Exterior mannequin integration—which supplier and the way?
Solely after defining information wants ought to you choose exterior mannequin suppliers and construct the combination layer. This part exams whether or not out there fashions can deal with your particular information and use case whereas staying inside finances. The main focus is on immediate engineering and API orchestration moderately than mannequin deployment, reflecting how trendy AI brokers truly get constructed.
desk,
th,
td {
border: 1px strong black;
border-collapse: collapse;
}
th,
td {
padding: 5px;
}
Sq.
Objective
Key Questions
Supplier choice & immediate engineering
Select exterior fashions and optimize in your use case.
Supplier analysis:
Which fashions deal with your necessities greatest?
Immediate technique:
How do you have to construction requests for optimum outcomes?
Context administration:
How do you have to work inside token limits?
Price validation:
Is that this economically viable at scale?
API integration & validation
Construct orchestration and validate efficiency.
Integration structure:
How do you hook up with suppliers?
Response processing:
How do you deal with outputs?
Efficiency testing:
Does it meet necessities?
Manufacturing readiness:
What wants hardening?
Determine 6.
The Agent POC Canvas V1 (Detailed).
Expanded view with particular steering for every of the eight squares masking product validation, agent design, information necessities, and exterior mannequin integration.
Unified information structure: fixing the infrastructure chaos
Bear in mind the infrastructure downside—groups managing three separate databases with completely different APIs and scaling traits? That is the place a unified information platform turns into crucial.
Brokers want three sorts of information storage:
Utility database:
For enterprise information, consumer profiles, and transaction historical past
Vector retailer:
For semantic search, information retrieval, and RAG
Reminiscence retailer:
For agent context, dialog historical past, and realized behaviors
As an alternative of juggling a number of techniques, groups can use a unified platform like MongoDB Atlas that gives all three capabilities—versatile doc storage for software information, native vector seek for semantic retrieval, and wealthy querying for reminiscence administration—all in a single platform.
This unified method means groups can concentrate on immediate engineering and orchestration moderately than mannequin infrastructure, whereas sustaining the flexibleness to evolve their information mannequin as necessities change into clearer. The information platform handles the complexity whilst you optimize how exterior fashions work together along with your information.
For embeddings and search relevance, specialised fashions like Voyage AI can present domain-specific understanding, notably for technical documentation the place general-purpose embeddings fall brief. The mixture of unified information structure with specialised embedding fashions addresses the infrastructure chaos that kills tasks.
This unified method means groups can concentrate on agent logic moderately than database administration, whereas sustaining the flexibleness to evolve their information mannequin as necessities change into clearer.
Canvas #2 – The manufacturing canvas for scaling your validated AI agent
When a POC succeeds, the manufacturing canvas guides the transition from “it really works” to “it really works at scale” by 11 squares organized following the identical product → agent → information → mannequin movement, with further operational issues:
Determine 7.
The Productionize Agent Canvas V1.
Eleven squares guiding the transition from validated POC to production-ready techniques, addressing scale, structure, operations, and governance.
Section 1: Product and scale planning
Remodel POC learnings into concrete enterprise metrics and scale necessities for manufacturing deployment. This part establishes the financial case for funding and defines what success appears to be like like at scale. With out clear KPIs and progress projections, manufacturing techniques change into costly experiments moderately than enterprise property.
desk,
th,
td {
border: 1px strong black;
border-collapse: collapse;
}
th,
td {
padding: 5px;
}
Sq.
Objective
Key Questions
Enterprise case & scale planning
Translate POC validation into manufacturing metrics.
Confirmed worth:
What did the POC validate?
Enterprise KPIs:
What metrics measure ongoing success?
Scale necessities:
What number of customers and interactions?
Development technique:
How will utilization develop over time?
Manufacturing necessities & constraints
Outline efficiency requirements and operational boundaries.
Efficiency requirements:
Response time, availability, throughput?
Reliability necessities:
Restoration time and failover?
Price range constraints:
Price limits and optimization targets?
Safety wants:
Compliance and information safety necessities?
Section 2: Agent structure
Design strong techniques that deal with advanced workflows, a number of brokers, and inevitable failures with out disrupting customers. This part addresses the orchestration and fault tolerance that POCs ignore however manufacturing calls for. The structure selections right here decide whether or not your agent can scale from 10 customers to 10,000 with out breaking.
desk,
th,
td {
border: 1px strong black;
border-collapse: collapse;
}
th,
td {
padding: 5px;
}
Sq.
Objective
Key Questions
Strong agent structure
Design for advanced workflows and fault tolerance.
Workflow orchestration:
How do you handle multi-step processes?
Multi-agent coordination:
How do specialised brokers collaborate?
Fault tolerance:
How do you deal with failures gracefully?
Replace rollouts:
How do you replace with out disruption?
Manufacturing reminiscence & context techniques
Implement scalable context administration.
Reminiscence structure:
Session, long-term, and organizational information?
Context persistence:
Storage and retrieval methods?
Cross-session continuity:
How do you keep consumer context?
Reminiscence lifecycle administration:
Retention, archival, and cleanup?
Section 3: Information infrastructure
Construct the info basis that unifies software information, vector storage, and agent reminiscence in a manageable platform. This part solves the “three database downside” that kills manufacturing deployments by complexity. A unified information structure reduces operational overhead whereas enabling the subtle retrieval and context administration that manufacturing brokers require.
desk,
th,
td {
border: 1px strong black;
border-collapse: collapse;
}
th,
td {
padding: 5px;
}
Sq.
Objective
Key Questions
Information structure & administration
Construct a unified platform for all information varieties.
Platform structure:
Utility, vector, and reminiscence information?
Information pipelines:
Ingestion, processing, and updates?
High quality assurance:
Validation and freshness monitoring?
Information governance:
Model management and approval workflows?
Information base & pipeline operations
Preserve and optimize information techniques.
Replace technique:
How does information evolve?
Embedding method:
Which fashions for which content material?
Retrieval optimization:
Search relevance and reranking?
Operational monitoring:
Pipeline well being and prices?
Section 4: Mannequin operations
Implement methods for managing a number of mannequin suppliers, fine-tuning, and value optimization at manufacturing scale. This part covers API administration, efficiency monitoring, and the continual enchancment pipeline for mannequin efficiency. The main focus is on orchestrating exterior fashions effectively moderately than deploying your personal, together with when and find out how to fine-tune.
desk,
th,
td {
border: 1px strong black;
border-collapse: collapse;
}
th,
td {
padding: 5px;
}
Sq.
Objective
Key Questions
Mannequin technique & optimization
Handle suppliers and fine-tuning methods.
Supplier choice:
Which fashions for which duties?
High-quality-tuning method:
When and find out how to customise?
Routing logic:
Base versus fine-tuned mannequin selections?
Price controls:
Caching and clever routing?
API administration & monitoring
Deal with exterior APIs and efficiency monitoring.
API configuration:
Key administration and failover?
Efficiency Monitoring:
Accuracy, latency, and prices?
High-quality-tuning pipeline:
Information assortment for enchancment?
Model management:
A/B testing and rollback methods?
Section 5: Hardening and operations
Add the safety, compliance, consumer expertise, and governance layers that remodel a working system into an enterprise-grade resolution. This part addresses the non-functional necessities that POCs skip however enterprises demand. With out correct hardening, even the most effective brokers stay caught in pilot purgatory attributable to safety or compliance issues.
desk,
th,
td {
border: 1px strong black;
border-collapse: collapse;
}
th,
td {
padding: 5px;
}
Sq.
Objective
Key Questions
Safety & compliance
Implement enterprise safety and regulatory controls.
Safety implementation:
Authentication, encryption, and entry administration?
Entry management:
Consumer and system entry administration?
Compliance framework:
Which laws apply?
Audit capabilities:
Logging and retention necessities?
Consumer expertise & adoption
Drive utilization and collect suggestions.
Workflow integration:
How do you match current processes?
Adoption technique:
Rollout and engagement plans?
Assist techniques:
Documentation and assist channels?
Suggestions integration:
How does consumer enter drive enchancment?
Steady enchancment & governance
Guarantee long-term sustainability.
Operational procedures:
Upkeep and launch cycles?
High quality gates:
Testing and deployment requirements?
Price administration:
Price range monitoring and optimization?
Continuity planning:
Documentation and workforce coaching?
Determine 8.
The Productionize Agent Canvas V1 (Detailed).
Expanded view with particular steering for every of the eleven squares masking scale planning, structure, information infrastructure, mannequin operations, and hardening necessities.
Subsequent steps: begin constructing AI brokers that ship ROI
MIT’s analysis discovered that
66% of executives need techniques that be taught from suggestions
, whereas 63% demand context retention (pg.14). The dividing line between AI and human choice is reminiscence, adaptability, and studying functionality.
The canvas framework straight addresses the failure patterns plaguing most tasks by forcing groups to reply crucial questions in the precise order—following the product → agent → information → mannequin movement that profitable groups have found.
In your subsequent agentic AI initiative:
Begin with the POC canvas to validate ideas rapidly.
Concentrate on consumer issues earlier than technical options.
Leverage AI instruments to quickly prototype after finishing your canvas.
Solely scale what customers truly need with the manufacturing canvas.
Select a unified information structure to cut back complexity from day one.
Bear in mind: The aim is not to construct probably the most refined agent doable—it is to construct brokers that clear up actual issues for actual customers in manufacturing environments.
For hands-on steering on reminiscence administration, try our
webinar
on YouTube, which covers important ideas and confirmed methods for constructing memory-augmented brokers.
Head over to the
MongoDB AI Studying Hub
to learn to construct and deploy AI purposes with MongoDB.
Sources
Obtain POC Canvas Template
(PDF)
Obtain Manufacturing Canvas Template
(PDF)
Obtain Mixed POC + Manufacturing Canvas
(Excel) – Get each canvases in a single excel file, with instance prompts and clean templates.
Full reference record
McKinsey & Firm
. (2025). “Seizing the agentic AI benefit.”
ttps://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage
MIT NANDA
. (2025). “The GenAI Divide: State of AI in Enterprise 2025.”
Report
Gartner
. (2025). “Gartner Predicts Over 40% of Agentic AI Initiatives Will Be Canceled by Finish of 2027.”
https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027
IBM
. (2025). “IBM Examine: Companies View AI Brokers as Important, Not Simply Experimental.”
https://newsroom.ibm.com/2025-06-10-IBM-Examine-Companies-View-AI-Brokers-as-Important,-Not-Simply-Experimental
Carnegie Mellon College
. (2025). “TheAgentCompany: Benchmarking LLM Brokers.”
https://www.cs.cmu.edu/information/2025/agent-company
Swyx
. (2023). “The Rise of the AI Engineer.” Latent Area.
https://www.latent.house/p/ai-engineer
SailPoint
. (2025). “SailPoint analysis highlights speedy AI agent adoption, driving pressing want for developed safety.”
https://www.sailpoint.com/press-releases/sailpoint-ai-agent-adoption-report
SS&C Blue Prism
. (2025). “Generative AI Statistics 2025.”
https://www.blueprism.com/sources/weblog/generative-ai-statistics-2025/
PagerDuty
. (2025). “State of Digital Operations Report.”
PagerDuty Report Finds A Majority of CIOs and CTOs View Agentic AI as Core to Future IT Operations
Wall Avenue Journal
. (2024). “How Moderna Is Utilizing AI to Reinvent Itself.”
https://www.wsj.com/articles/at-moderna-openais-gpts-are-changing-almost-everything-6ff4c4a5
September 23, 2025
