June 7, 2026

•

12 min read

ByDiego Carrion·Co-founder, Duotach

Build in progressAI AgentsWhatsAppClaude

AI WhatsApp agent for e-commerce: how we build it

An AI WhatsApp agent is a conversational assistant that uses a language model to understand the customer, query external data (stock, CRM, orders) with tools, and decide what to do, without a rigid decision tree. Here we share the engineering detail of one we are building right now for a retail and e-commerce company in Ecuador: the real architecture, the three decisions that matter most, and why we made them this way.

Let's build yours See WhatsApp chatbots

This is not theory: we are building it

This article is the engineering detail of an AI WhatsApp agent that at Duotach we are building right now for a retail and e-commerce company in Ecuador, on WhatsApp Business API. We share the real architecture and the design decisions that matter most.

The project is in progress: we do not have final ROI numbers yet, and when the system is stable in production we will publish the results. In the meantime, this is what is already decided and built. What we can show today is the reasoning: how you decide the architecture of a serious agent instead of a demo bot.

What an AI WhatsApp agent is (and why it is not a chatbot)

The word "chatbot" gets used for two things that are not technically alike. A decision-tree chatbot is a map of answers: the customer types a keyword, the system matches a trigger and returns a pre-loaded reply. It is fast and predictable, but it breaks the moment the customer goes off script or asks a compound question.

An AI agent is a different thing. It reads the full history, interprets intent, queries external systems when needed, and picks the action without a rigid flow. The practical difference for an operations manager: the chatbot answers isolated questions, the agent runs an end-to-end conversation.

Dimension	Decision-tree chatbot	AI agent with tool use
Understands natural language	No (keywords)	Yes (LLM)
Handles compound questions	No	Yes
Queries external data live	Limited, scripted	Yes, via tools
Keeps conversation context	Poor	Full
Decides when to escalate to a human	Fixed rule	Contextual decision
Maintenance cost as it grows	High (more branches)	Stable (more tools)

The agent we are building uses AI agents with Claude Code and the Anthropic SDK with native tool use. It is not a "build your bot with no code" wizard: it is a software system with its own reasoning loop, knowledge base and audit log.

The client's problem: retail and e-commerce in Ecuador

The client is a retail and e-commerce company operating in Ecuador, with plans to expand to more countries in the region. Their customer support lives on WhatsApp: product questions, order status, shipping, after-sales. They already have a mature CRM with working sales and support pipelines, and a mostly non-technical team that replies from that CRM.

The goal is not to replace that team, but to take the repetitive volume off their plate (the same questions about shipping, hours, availability) so people can focus on the cases that truly need a human. Three constraints shaped the whole design:

Don't touch what already works. The CRM pipelines are mature. The agent comes in as a layer, it does not reorder the existing workflow.
The team must be able to operate it without us. By month 12, with no Duotach in the middle, the client must be able to edit answers, adjust rules and understand what happened in each conversation.
Multi-country by design. What we build for one country has to scale to others, with different content and tone per market.

The three architecture decisions that matter

When a decision is load-bearing (it holds up the rest of the system for months), at Duotach we run it through a formal analysis of alternatives before committing. These are the three that defined the agent.

1. Orchestration: all-n8n vs all-SDK vs hybrid

The core question: does the agent loop (receive the message, search the knowledge base, call the LLM, run tools, reply) live in n8n, live in code, or split? We evaluated three paths. This table is the heart of the decision:

Approach	Pros	Cons
All in n8n visual workflows	Maximum delivery speed if the team already knows n8n. Per-execution visibility in the UI. Reinforces the standard pattern.	Multi-turn tool use turns into "Function-node spaghetti" above 3-4 tools. Hard to test and version. Maintainability ceiling with complex logic.
All in code service with Anthropic SDK	Native, readable tool use. Testable, typed, versionable as code. Any backend dev understands it.	More of a curve for deploy, health checks and observability. No native debug UI. Breaks the standard pattern of an n8n-first team.
Hybrid ✓ n8n + Node service	n8n does the plumbing; code does the reasoning. Each tool where it shines. The most reversible.	Two components to monitor instead of one. Requires documenting the split clearly.

We chose the hybrid. n8n handles the plumbing: it receives the WhatsApp webhook and responds fast (the CRM Salesbot has a hard 2-second timeout that we mitigate with an async response), buffers the messages the customer sends one at a time, looks up the contact in the CRM and, after each turn, runs the effects on the CRM. The agent loop lives in a Node TypeScript service using @anthropic-ai/sdk.

The rule that organizes everything is simple: the service decides what happens, n8n makes it happen. Multi-turn tool use needs to chain model calls in one place, so it lives in code. The CRM change (reassign owner, create lead) is run by n8n afterwards, which leaves a natural control point for an eventual human approval. And it is the most reversible option: if the service ever gives trouble, the loop can move back to n8n; if n8n gets in the way for plumbing, it migrates to the service.

2. Knowledge base: Postgres + pgvector as the single source of truth

The agent needs a knowledge base (KB): shipping policies, FAQs, product info, support runbooks. That KB has to serve two masters: be editable by a non-technical team, and be queryable by the agent with low latency in every conversation. We evaluated Notion as the primary source, an Obsidian vault in Git, and loading the whole KB into Claude's context with no semantic search.

We landed on the simplest option: Postgres in the client's cloud as the single source of truth, with the pgvector extension for semantic retrieval. Documents live as rows in Postgres; on save we generate the embeddings (with text-embedding-3-small, 1536 dimensions); the agent runs a hybrid search (semantic with pgvector + keyword with tsvector) filtered by country.

Why not a separate vector DB (Pinecone, Qdrant)? At this scale, pgvector responds under 100 ms without adding another vendor or another sync pipeline. Why not Notion as primary? It would break the principle that everything lives in the client's infra and force us to maintain a fragile sync job. Why not load everything into context? Above a certain volume, cost explodes and you lose the record of which documents the model saw. One store, one write path, one read path. No sync between systems.

3. Everything in the client's infra from day one

This is not a technical detail, it is a guiding principle: every material asset (code, database, secrets, logs, repositories) lives in the client's infrastructure from the first commit. n8n runs as a container on a client EC2. The Node service runs on the same EC2. The database is a client Postgres RDS. The repositories are in the client's GitHub organization; Duotach is a contributor, not an owner.

Why does it matter so much that it comes first? Because it removes the black box. There is no Duotach server the client depends on forever. If tomorrow they want to operate the system with another provider, they already have it all: the code, the data and the documentation of the decisions. It is real ownership, not a marketing promise. And it conditions the other two decisions: we ruled out Notion as the primary source and frameworks with SaaS debug UIs precisely because they would pull assets out of the client's infra.

The chosen stack and why

The system is designed in layers. Each one has a clear responsibility and only talks to the one next to it.

Layer	Technology	What it does
Channel	Wazzup + WhatsApp Business API	Connects the WhatsApp number and delivers messages to the CRM.
CRM and inbox	Kommo + Salesbot	Where contacts and pipelines live, and where humans reply. Fires the webhook on each incoming message.
Orchestration	n8n (Docker on client EC2)	Plumbing: webhook intake, buffering, contact lookup, side-effect execution, cron jobs.
Agent loop	Node TypeScript + @anthropic-ai/sdk	Reasoning: KB retrieval, LLM call, multi-turn tool use, audit log. Not exposed to the internet.
Data	Postgres RDS + pgvector + pg_trgm	Source of truth: documents, FAQs, prompts, routing rules, audit log and embeddings.
Admin	Next.js webapp	Where the client's team edits the KB, prompts and rules, and reviews the conversation history.

On the model: we use Claude with a cost-optimized mix. Sonnet 4.5 as the main model for reasoning and conversation; Haiku 4.5 for classification and simple FAQs. The base Anthropic SDK gives native tool use, prompt caching and streaming without putting the full Claude Code harness into production, which for a conversational agent would be over-tooling.

The tools the agent can invoke in this first phase:

search_knowledge_base — searches the country-filtered KB when it needs info that did not fit in the initial context.
lookup_contact_kommo — pulls the recent history and contact data from the CRM.
handoff_to_human — flags the conversation to hand off to a human, with reason, urgency and intent for routing.

The roadmap adds lookup_order_status (order status), create_lead_kommo (create a lead on clear purchase intent) and escalate_with_context (handoff with a structured summary). If you are interested in the general pattern behind this, we develop it in our guide on AI agents with Claude Code.

How a conversation is resolved, step by step

The conceptual flow of a turn, end to end:

Customer (WhatsApp)
   ↓
Wazzup → Kommo (Salesbot fires webhook)
   ↓  widget_request (ack < 100 ms, 2s timeout mitigated async)
n8n: intake + buffering by chat_id + contact lookup
   ↓  POST /turn { chat_id, country, messages, contact_id }
Node service (agent loop):
   1. Embed the query
   2. Hybrid retrieval (pgvector + tsvector) by country
   3. Load the country's active system prompt
   4. Call Claude with tools
   5. Multi-turn loop while it requests tools
   6. Write the audit_log
   ↓  { response_text, side_effects }
n8n: reply to the customer + run side effects in the CRM
   ↓
Customer receives the reply on WhatsApp

Example: simple question

The customer types "do you ship to Quito?". n8n acks in under 100 ms, waits a few seconds in case more messages arrive, looks up the contact and infers the country. The service embeds the query, retrieves the shipping documents, loads the country prompt and calls Claude. Claude answers directly, with no tools. The service writes the turn to the audit log and returns the text. Total latency around 8 seconds, within budget.

Example: tool use + handoff

The customer types "I've had an account problem for two days and no one replies". Claude requests lookup_contact_kommo, sees the unanswered history and decides to escalate with handoff_to_human, high urgency. The service queries the routing rules, generates an empathetic reply and returns to n8n the intent to reassign the owner. n8n sends the message and reassigns the ticket in the CRM.

Every turn is logged: which documents were retrieved, which tools were called, which prompt was used, how many tokens and how much latency. That per-turn audit log, stored in Postgres and visible from the admin, is better traceability than a workflow UI for the question that truly matters: "why did the agent answer this?". It is queryable, exportable and does not rotate away after a few days.

What comes next

Let's be honest about the status: the system is under construction. Already decided and underway are the hybrid architecture, the KB on Postgres + pgvector, the agent loop service with its first tools, and the client-infra principle.

What comes by phases: the full admin panel so the team can operate without us, prompt caching to lower costs, new tools (order status, lead creation), operational metrics and multi-country expansion.

When the agent is stable in production and we have real data (volume resolved without a human, response times, CSAT), we will publish the full case study with numbers. For now, what we can show is the reasoning. If you want to see the rest of what we build, it is in our case studies. And if you are not at this level yet but want to start with the basics, we wrote a guide on how to automate WhatsApp Business with n8n.

Let's build yours

If your operation lives on WhatsApp and the volume is already weighing on the team, a well-built AI agent (not a decision-tree bot) can absorb the repetitive work and free people for what matters. We build these with the architecture you see here: portable, auditable and operated by your team. We quote by scope.

Let's talk about your case See WhatsApp chatbots

Frequently Asked Questions

What is the difference between an AI WhatsApp agent and a chatbot?+

A chatbot follows a decision tree with keywords and pre-loaded answers: it breaks the moment the customer goes off script. An AI agent uses an LLM to understand natural language, query external data with tools, and decide the action without a rigid flow. The chatbot answers questions; the agent runs the whole conversation.

What stack do you use to build the AI WhatsApp agent?+

WhatsApp Business API via Wazzup as the channel, Kommo as the CRM, n8n for plumbing, a Node TypeScript service with the Anthropic SDK (Claude) for the agent loop, and Postgres with pgvector as the knowledge base. A Next.js admin webapp lets the client's team operate everything without touching code.

Why a hybrid of n8n + code instead of doing everything in n8n?+

Because multi-turn tool use becomes unmanageable in visual workflows above three or four tools. Isolating the agent's reasoning in TypeScript code makes it readable, testable and versionable, while n8n keeps doing what it does best: the visual plumbing of integrations. It is also the most reversible option.

Why Postgres with pgvector and not a dedicated vector DB?+

At the scale of an e-commerce KB, pgvector responds under 100 ms without adding another vendor or a separate sync pipeline. It keeps a single source of truth inside the client's infrastructure, simplifies maintenance and reduces failure modes versus having two systems to keep in sync.

Does the agent's data stay on Duotach servers?+

No. The project principle is that everything (code, database, secrets, logs, repositories) lives in the client's infrastructure from day one. Duotach builds, hands off and supports, but there is no black box and no technical dependency on our servers. The client owns it all from the start.

How much does it cost to build an AI WhatsApp agent like this?+

We quote by scope, not by the hour. The price depends on conversation volume, the number of integrations (CRM, e-commerce, orders), how many tools the agent has, and whether it is single or multi-country. Model licenses and cloud infrastructure are paid separately and stay in the client's name. Send us your operation's context and we will put together a concrete proposal.

Seguir leyendo