A typical AI agent utility in 2025 often includes:
- A cloud-hosted LLM
- A vector database for retrieval
- A separate operational database
- Immediate administration and gear administration instruments
- Observability and tracing frameworks
- Guardrails
Every software solves an issue. Collectively, nonetheless, they’ll probably create architectural sprawl with unpredictable latency, rising operational prices, and governance blind spots. In consequence, plenty of AI brokers by no means transfer past demos or inner prototypes as a result of the complexity escalates too quick.
This put up walks by way of how we migrated an current AI agent utility to Couchbase AI Companies and the Agent Catalog, transferring to a single production-ready AI platform.
The Core Downside: Fragmentation Kills Manufacturing AI
It’s essential to grasp why agentic methods wrestle in manufacturing. Most AI brokers at this time are constructed from too many loosely coupled components: prompts reside in a single system, vectors in one other, conversations are logged inconsistently, instruments are invoked with out clear traceability – making agent conduct tough to debug. On the similar time, sending enterprise information to third-party LLM endpoints introduces compliance and safety dangers. Lastly, governance is often handled as an afterthought; many frameworks emphasize what an agent can do, however fail to clarify why it decided, which immediate or software influenced it, or whether or not that call ought to have been allowed in any respect. That is an unacceptable hole for actual enterprise workflows.
What Are Couchbase AI Companies?
Constructing AI purposes usually includes juggling a number of providers: a vector database for reminiscence, an inference supplier for LLMs (like OpenAI or Anthropic), and separate infrastructure for embedding fashions.
Couchbase AI Companies streamlines this by offering a unified platform the place your operational information, vector search, and AI fashions reside collectively. It affords:
- LLM inference and embeddings API: Entry well-liked LLMs (like Llama 3) and embedding fashions instantly inside Couchbase Capella, with no exterior API keys, no further infrastructure, and no information egress. Your utility information stays inside Capella. Queries, vectors, and mannequin inference all occur the place the info lives. This permits safe, low-latency AI experiences whereas assembly privateness and compliance necessities. The important thing worth: information and AI collectively, with delicate data saved inside your system.
- Unified platform: Preserve your database, vectorization, search, and mannequin in a central location.
- Built-in Vector Search: Carry out semantic search instantly in your JSON information with millisecond latency.
Why Is This Wanted?
As we transfer from easy chatbots to agentic workflows – the place AI fashions autonomously use instruments – latency and setup complexity grow to be main bottlenecks. Couchbase AI Companies takes a platform-first method. By co-locating your information and AI providers, it reduces operational overhead and latency. As well as, instruments just like the Agent Catalog assist handle lots of of agent prompts and instruments, whereas offering built-in logging and telemetry for brokers.
At this level, the query shifts from why a platform-first method issues to the way it works in observe.
So let’s discover how one can migrate an current agentic utility, and enhance its efficiency, governance, and reliability alongside the best way.
What the Present App Appears to be like Like
The present utility is an HR Sourcing Agent designed to automate the preliminary screening of candidates. The principle job of the agent utility is to ingest uncooked resume information (PDFs), perceive the content material of the resumes utilizing an LLM, and construction the unstructured information right into a queryable format enriched with semantic embeddings in Couchbase. It permits HR professionals to add a brand new job description and get outcomes for the best-suited candidates utilizing Couchbase vector search.
In its present state, the HR Sourcing App is a Python-based microservice that wraps an LLM with the Google ADK. It manually wires collectively mannequin definitions, agent prompts, and execution pipelines. Whereas purposeful, the structure requires the developer to handle session state in reminiscence, deal with retry logic, clear uncooked mannequin outputs, and preserve the mixing between the LLM and the database manually. Additionally, there isn’t any built-in telemetry for our agent.
The app manually instantiates a mannequin supplier. On this particular case, it connects to a hosted open supply mannequin (Qwen 2.5-72B by way of Nebius) utilizing the LiteLLM wrapper. The app has to manually spin up a runtime atmosphere for the agent. It initializes an InMemorySessionService to trace the state of the dialog (even when short-lived) and a Runner to execute the consumer’s enter (the resume textual content) in opposition to the agent pipeline.
Migrating the Agent Software to Couchbase AI Companies
Now let’s dive into the best way to migrate the core logic of our agent to make use of Couchbase AI Companies and the Agent Catalog.
The brand new agent makes use of a LangChain ReAct agent to course of job descriptions, it performs clever candidate matching utilizing vector search and supplies ranked candidate suggestions with explanations.
Conditions
Earlier than we start, guarantee you will have:
Set up Dependencies
We’ll begin by putting in the required packages. This consists of the agentc CLI for the catalog and the LangChain integration packages.
|
%pip set up –q
“pydantic>=2.0.0,<3.0.0” “python-dotenv>=1.0.0,<2.0.0” “pandas>=2.0.0,<3.0.0” “nest-asyncio>=1.6.0,<2.0.0” “langchain-couchbase>=0.2.4,<0.5.0” “langchain-openai>=0.3.11,<0.4.0” “arize-phoenix>=11.37.0,<12.0.0” “openinference-instrumentation-langchain>=0.1.29,<0.2.0”
# Set up Agent Catalog %pip set up agentc==1.0.0 |
Centralized Mannequin Service (Couchbase AI Mannequin Companies Integration)
Within the unique adk_resume_agent.py, we needed to manually instantiate LiteLLM, handle particular supplier API keys (Nebius, OpenAI, and so on.), and deal with the connection logic inside our utility code. We are going to migrate the code to make use of Couchbase.
Couchbase AI Companies supplies OpenAI-compatible endpoints which are utilized by the brokers. For the LLM and embeddings, we use the LangChain OpenAI package deal, which integrates instantly with the LangChain Couchbase connector.
Allow AI Companies
- Navigate to Capella’s AI Companies part on the UI.
- Deploy the Embeddings and LLM fashions.
- It is advisable to launch an embedding and an LLM for this demo in the identical area because the Capella cluster the place the info shall be saved.
- Deploy an LLM that has software calling capabilities comparable to mistralai/mistral-7b-instruct-v0.3. For embeddings, you may select a mannequin just like the nvidia/llama-3.2-nv-embedqa-1b-v2.
- Be aware the endpoint URL and generate API keys.
For extra particulars on launching AI fashions, you may test the official documentation.
Implementing the Code Logic for LLM and Embedding Fashions
We have to configure the endpoints for Capella Mannequin Companies. Capella Mannequin Companies are appropriate with the OpenAI API format, so we are able to use the usual langchain-openai library by pointing it to our Capella endpoint. We initialize the embedding mannequin with OpenAIEmbeddings and the LLM with ChatOpenAI, however level it to Capella.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# Mannequin Companies Config CAPELLA_API_ENDPOINT = getpass.getpass(“Capella Mannequin Companies Endpoint: “) CAPELLA_API_LLM_MODEL = “mistralai/mistral-7b-instruct-v0.3” CAPELLA_API_LLM_KEY = getpass.getpass(“LLM API Key: “) CAPELLA_API_EMBEDDING_MODEL = “nvidia/llama-3.2-nv-embedqa-1b-v2” CAPELLA_API_EMBEDDINGS_KEY = getpass.getpass(“Embedding API Key: “)
def setup_ai_services(temperature: float = 0.0): embeddings = None llm = None
if not embeddings and os.getenv(“CAPELLA_API_ENDPOINT”) and os.getenv(“CAPELLA_API_EMBEDDINGS_KEY”): attempt: endpoint = os.getenv(“CAPELLA_API_ENDPOINT”) api_key = os.getenv(“CAPELLA_API_EMBEDDINGS_KEY”) mannequin = os.getenv(“CAPELLA_API_EMBEDDING_MODEL”, “Snowflake/snowflake-arctic-embed-l-v2.0”)
api_base = endpoint if endpoint.endswith(‘/v1’) else f“{endpoint}/v1”
embeddings = OpenAIEmbeddings( mannequin=mannequin, api_key=api_key, base_url=api_base, check_embedding_ctx_length=False, ) besides Exception as e: logger.error(f“Couchbase AI embeddings failed: {e}”)
if not llm and os.getenv(“CAPELLA_API_ENDPOINT”) and os.getenv(“CAPELLA_API_LLM_KEY”): attempt: endpoint = os.getenv(“CAPELLA_API_ENDPOINT”) llm_key = os.getenv(“CAPELLA_API_LLM_KEY”) llm_model = os.getenv(“CAPELLA_API_LLM_MODEL”, “deepseek-ai/DeepSeek-R1-Distill-Llama-8B”)
api_base = endpoint if endpoint.endswith(‘/v1’) else f“{endpoint}/v1”
llm = ChatOpenAI( mannequin=llm_model, base_url=api_base, api_key=llm_key, temperature=temperature, ) test_response = llm.invoke(“Hi there”) besides Exception as e: logger.error(f“Couchbase AI LLM failed: {e}”) llm = None |
As a substitute of hardcoding mannequin suppliers, the agent now connects to a unified Capella endpoint, which acts as an API gateway for each the LLM and the embedding mannequin.
Decoupling Prompts and Instruments With Agent Catalog
The Agent Catalog is a strong software for managing the lifecycle of your agent’s capabilities. As a substitute of hardcoding prompts and gear definitions in your Python information, you handle them as versioned property. You possibly can centralize and reuse your instruments throughout your growth groups. You may as well study and monitor agent responses with the Agent Tracer. These options present visibility, management, and traceability for agent growth and deployment. Your groups can construct brokers with confidence, realizing they are often audited and managed successfully.
With out the power to back-trace agent conduct, it turns into not possible to automate the continued belief, validation, and corroboration of the autonomous selections made by brokers. Within the Agent Catalog, that is carried out by evaluating each the agentic code and its dialog transcript with its LLM to evaluate the appropriateness of its pending choice or MCP software lookup.
So let’s incorporate Agent Catalog within the venture.
Including the Vector Search Device
We are going to begin by including our software definition for the Agent Catalog. On this case now we have the vector search software.
So as to add a brand new Python perform as a software on your agent, you should use the Agent Catalog command-line software’s add command:
agentc add
In case you have an current Python software that you just wish to add to the Agent Catalog, add agentc to your imports and the @agentc.catalog.software decorator to your software definition. In our instance, we outline a Python perform for performing vector search as our software.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
“”“ Vector search software for locating candidates based mostly on job descriptions. This software makes use of Couchbase vector search to search out essentially the most related candidates.
Up to date for Agent Catalog v1.0.0 with @software decorator. ““”
import os import logging from typing import Checklist, Dict, Any from datetime import timedelta
from agentc_core.software import software from couchbase.cluster import Cluster from couchbase.auth import PasswordAuthenticator from couchbase.choices import ClusterOptions from couchbase.vector_search import VectorQuery, VectorSearch from couchbase.search import SearchRequest, MatchNoneQuery
logger = logging.getLogger(__name__)
def generate_embedding(textual content: str, embeddings_client) -> Checklist[float]: “”“Generate embeddings for textual content utilizing the offered embeddings consumer.”“” attempt: # Use the embeddings consumer to generate embeddings consequence = embeddings_client.embed_query(textual content) return consequence besides Exception as e: logger.error(f“Error producing embedding: {e}”) return [0.0] * 1024 # Return zero vector as fallback
@software( identify=“search_candidates_vector”, description=“Seek for candidates utilizing vector similarity based mostly on a job description. Returns matching candidate profiles ranked by relevance.”, annotations={“class”: “hr”, “sort”: “search”} ) def search_candidates_vector( job_description: str, num_results: int = 5, embeddings_client=None, ) -> str: “”“ Seek for candidates utilizing vector similarity based mostly on job description.
Args: job_description: The job description textual content to go looking in opposition to num_results: Variety of high candidates to return (default: 5) embeddings_client: The embeddings consumer for producing question embeddings
Returns: Formatted string with candidate data ““” attempt: # Get atmosphere variables bucket_name = os.getenv(“CB_BUCKET”, “travel-sample”) scope_name = os.getenv(“CB_SCOPE”, “agentc_data”) collection_name = os.getenv(“CB_COLLECTION”, “candidates”) index_name = os.getenv(“CB_INDEX”, “candidates_index”)
# Hook up with Couchbase cluster = get_cluster_connection() if not cluster: return “Error: Couldn’t hook up with database”
bucket = cluster.bucket(bucket_name) scope = bucket.scope(scope_name) assortment = scope.assortment(collection_name) # Use scope.assortment(), not bucket.assortment()
# Generate question embedding logger.data(f“Producing embedding for job description…”) if embeddings_client is None: return “Error: Embeddings consumer not offered”
query_embedding = generate_embedding(job_description, embeddings_client)
# Carry out vector search logger.data(f“Performing vector search with index: {index_name}”) search_req = SearchRequest.create(MatchNoneQuery()).with_vector_search( VectorSearch.from_vector_query( VectorQuery(“embedding”, query_embedding, num_candidates=num_results * 2) ) )
consequence = scope.search(index_name, search_req, timeout=timedelta(seconds=20)) rows = record(consequence.rows())
if not rows: return “No candidates discovered matching the job description.”
# Fetch candidate particulars candidates = [] for row in rows[:num_results]: attempt: doc = assortment.get(row.id, timeout=timedelta(seconds=5)) if doc and doc.worth: information = doc.worth information[“_id”] = row.id information[“_score”] = row.rating candidates.append(information) besides Exception as e: logger.warning(f“Error fetching candidate {row.id}: {e}”) proceed
# Format outcomes if not candidates: return “No candidate particulars may very well be retrieved.”
result_text = f“Discovered {len(candidates)} matching candidates:nn”
for i, candidate in enumerate(candidates, 1): result_text += f“**Candidate {i}: {candidate.get(‘identify’, ‘Unknown’)}**n” result_text += f“- Match Rating: {candidate.get(‘_score’, 0):.4f}n” result_text += f“- E mail: {candidate.get(‘e mail’, ‘N/A’)}n” result_text += f“- Location: {candidate.get(‘location’, ‘N/A’)}n” result_text += f“- Years of Expertise: {candidate.get(‘years_experience’, 0)}n”
expertise = candidate.get(‘expertise’, []) if expertise: result_text += f“- Abilities: {‘, ‘.be part of(expertise[:10])}n”
technical_skills = candidate.get(‘technical_skills’, []) if technical_skills: result_text += f“- Technical Abilities: {‘, ‘.be part of(technical_skills[:10])}n”
abstract = candidate.get(‘abstract’, ”) if abstract: # Truncate abstract if too lengthy summary_text = abstract[:200] + “…” if len(abstract) > 200 else abstract result_text += f“- Abstract: {summary_text}n”
result_text += “n”
return result_text
besides Exception as e: logger.error(f“Error in vector search: {e}”) import traceback traceback.print_exc() return f“Error performing candidate search: {str(e)}” |
Including the Prompts
Within the unique structure, the agent’s directions have been buried contained in the Python code as giant string variables, making them tough to model or replace and not using a full deployment. With the Agent Catalog, we now outline our “HR Recruiter” persona as a standalone, managed asset utilizing prompts. Utilizing a structured YAML definition (record_kind: immediate), we create the hr_recruiter_assistant. This definition doesn’t simply maintain the textual content; it encapsulates all the conduct of the agent, strictly defining the ReAct sample (Thought → Motion → Commentary) that guides the LLM to make use of the vector search software successfully.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
record_kind: immediate identify: hr_recruiter_assistant description: AI–powered HR recruiter assistant that helps match candidates to job descriptions utilizing vector search annotations: class: hr sort: recruitment content material: | You are an knowledgeable HR recruiter assistant with deep data of expertise acquisition and candidate matching. Your function is to assist HR professionals discover the greatest candidates for job openings by analyzing job descriptions and looking out by way of a database of candidate profiles.
You have entry to the following instruments: {instruments}
Use the following format for your responses:
Query: the enter query or job description you should analyze Thought: suppose about what data you want to discover the greatest candidates Motion: the motion to take, ought to be one of [{tool_names}] Motion Enter: the enter to the motion (for candidate search, present the job description textual content) Commentary: the consequence of the motion ... (this Thought/Motion/Motion Enter/Commentary can repeat N instances) Thought: I now have sufficient data to present suggestions Closing Reply: Present a complete abstract of the high candidates together with: – Candidate names and key {qualifications} – Abilities match share and relevance – Years of expertise – Why every candidate is a good match for the function – Any notable strengths or distinctive {qualifications}
IMPORTANT GUIDELINES: – At all times use the search_candidates_vector software to discover candidates – Analyze the job description to perceive required expertise and expertise – Present detailed reasoning for candidate suggestions – Spotlight each technical expertise and smooth expertise when related – Be particular about match percentages and scores – Format your last reply in a clear, skilled method
Start!
Query: {enter} Thought: {agent_scratchpad} |
Index and Publishing the Native Information
We use agentc to index our native information and publish them to Couchbase. This shops the metadata within the database, making it searchable and discoverable by the agent at runtime.
|
# Create native index of instruments and prompts agentc index .
# Add to Couchbase agentc publish |
In our code, we initialize the Catalog and use catalog.discover() to retrieve verified prompts and instruments. We now not hardcode prompts; as an alternative, we fetch them.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# BEFORE: Hardcoded Immediate Strings # parse_instruction = “You’re a resume parsing assistant…”
import agentc from agentc import Catalog, Span
# AFTER: Dynamic Asset Loading catalog = Catalog()
# Load the “search” software dynamically tool_result = catalog.discover(“software”, identify=“search_candidates_vector”)
# Load the “recruiter” persona dynamically prompt_result = catalog.discover(“immediate”, identify=“hr_recruiter_assistant”)
# We act on the retrieved metadata instruments = [Tool(name=tool_result.meta.name, func=...)] |
Standardized Reasoning Engine (LangChain Integration)
The earlier app used a customized SequentialAgent pipeline. Whereas versatile, it meant we needed to preserve our personal execution loops, error dealing with, and retry logic for the agent’s reasoning steps.
By leveraging the Agent Catalog’s compatibility with LangChain, we switched to a normal ReAct (Cause + Act) agent structure. We merely feed the instruments and prompts fetched from the catalog instantly into create_react_agent.
What’s the profit? We get industry-standard reasoning loops – Thought -> Motion -> Commentary – out of the field. The agent can now autonomously determine to seek for “React Builders,” analyze the outcomes, after which carry out a second seek for “Frontend Engineers” if the primary yields few outcomes. one thing the linear ADK pipeline struggled with.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
def create_langchain_agent(self, catalog: Catalog, embeddings, llm):
attempt:
# Load instruments from catalog utilizing v1.0.0 API tool_result = catalog.discover(“software”, identify=“search_candidates_vector”)
# Create software wrapper that injects embeddings consumer def search_with_embeddings(job_description: str) -> str: return tool_result.func( job_description=job_description, num_results=5, embeddings_client=embeddings, )
instruments = [ Tool( name=tool_result.meta.name, description=tool_result.meta.description, func=search_with_embeddings, ), ]
# Load immediate from catalog utilizing v1.0.0 API
prompt_result = catalog.discover(“immediate”, identify=“hr_recruiter_assistant”) if prompt_result is None: elevate ValueError(“Couldn’t discover hr_recruiter_assistant immediate in catalog. Run ‘agentc index’ first.”)
custom_prompt = PromptTemplate( template=prompt_result.content material.strip(), input_variables=[“input”, “agent_scratchpad”], partial_variables={ “instruments”: “n”.be part of([f“{tool.name}: {tool.description}” for tool in tools]), “tool_names”: “, “.be part of([tool.name for tool in tools]), }, )
# Create agent agent = create_react_agent(llm, instruments, custom_prompt) agent_executor = AgentExecutor( agent=agent, instruments=instruments, verbose=True, handle_parsing_errors=handle_parsing_error, max_iterations=5, max_execution_time=120, early_stopping_method=“drive”, return_intermediate_steps=True, )
logger.data(“LangChain ReAct agent created efficiently”) return agent_executor |
Constructed-in Observability (Agent Tracing)
Within the earlier agent utility, observability was restricted to print() statements. There was no technique to “replay” an agent’s session to grasp why it rejected a particular candidate.
Agent Catalog supplies tracing. It permits customers to make use of SQL++ with traces, leverage the efficiency of Couchbase, and get perception into particulars of prompts and instruments in the identical platform.
We will add Transactional Observability utilizing catalog.Span(). We wrap the execution logic in a context supervisor that logs each thought, motion, and consequence again to Couchbase. We will now view a full “hint” of the recruitment session within the Capella UI, displaying precisely how the LLM processed a candidate’s resume.
|
application_span = catalog.Span(identify=“HR Recruiter Agent”)
# AFTER: granular observability with span.new(identify=“job_matching_query”) as query_span: # Log the enter query_span.log(UserContent(worth=job_description))
# Run the agent response = agent.invoke({“enter”: job_description})
# Log the agent’s last choice query_span.log(AssistantContent(worth=response[“output”])) |
Conclusion
AI brokers fail in manufacturing not as a result of LLMs lack functionality, however as a result of agentic methods can grow to be too advanced. By adopting a platform-first method with Couchbase AI Companies and the Agent Catalog, we reworked a posh agent right into a ruled, scalable agentic system.
In case you’re constructing AI brokers at this time, the true query isn’t which LLM to make use of – it’s the way you’ll run brokers safely, observably, and at scale. Couchbase AI Companies are constructed for precisely that.
