AI WhatsApp agent for e-commerce: how we build it
An AI WhatsApp agent is a conversational assistant that uses a language model to understand the customer, query external data (stock, CRM, orders) with tools, and decide what to do, without a rigid decision tree. Here we share the engineering detail of one we are building right now for a retail and e-commerce company in Ecuador: the real architecture, the three decisions that matter most, and why we made them this way.
This is not theory: we are building it
This article is the engineering detail of an AI WhatsApp agent that at Duotach we are building right now for a retail and e-commerce company in Ecuador, on WhatsApp Business API. We share the real architecture and the design decisions that matter most.
The project is in progress: we do not have final ROI numbers yet, and when the system is stable in production we will publish the results. In the meantime, this is what is already decided and built. What we can show today is the reasoning: how you decide the architecture of a serious agent instead of a demo bot.
What an AI WhatsApp agent is (and why it is not a chatbot)
The word "chatbot" gets used for two things that are not technically alike. A decision-tree chatbot is a map of answers: the customer types a keyword, the system matches a trigger and returns a pre-loaded reply. It is fast and predictable, but it breaks the moment the customer goes off script or asks a compound question.
An AI agent is a different thing. It reads the full history, interprets intent, queries external systems when needed, and picks the action without a rigid flow. The practical difference for an operations manager: the chatbot answers isolated questions, the agent runs an end-to-end conversation.
| Dimension | Decision-tree chatbot | AI agent with tool use |
|---|---|---|
| Understands natural language | No (keywords) | Yes (LLM) |
| Handles compound questions | No | Yes |
| Queries external data live | Limited, scripted | Yes, via tools |
| Keeps conversation context | Poor | Full |
| Decides when to escalate to a human | Fixed rule | Contextual decision |
| Maintenance cost as it grows | High (more branches) | Stable (more tools) |
The agent we are building uses AI agents with Claude Code and the Anthropic SDK with native tool use. It is not a "build your bot with no code" wizard: it is a software system with its own reasoning loop, knowledge base and audit log.
The client's problem: retail and e-commerce in Ecuador
The client is a retail and e-commerce company operating in Ecuador, with plans to expand to more countries in the region. Their customer support lives on WhatsApp: product questions, order status, shipping, after-sales. They already have a mature CRM with working sales and support pipelines, and a mostly non-technical team that replies from that CRM.
The goal is not to replace that team, but to take the repetitive volume off their plate (the same questions about shipping, hours, availability) so people can focus on the cases that truly need a human. Three constraints shaped the whole design:
- Don't touch what already works. The CRM pipelines are mature. The agent comes in as a layer, it does not reorder the existing workflow.
- The team must be able to operate it without us. By month 12, with no Duotach in the middle, the client must be able to edit answers, adjust rules and understand what happened in each conversation.
- Multi-country by design. What we build for one country has to scale to others, with different content and tone per market.
The three architecture decisions that matter
When a decision is load-bearing (it holds up the rest of the system for months), at Duotach we run it through a formal analysis of alternatives before committing. These are the three that defined the agent.
1. Orchestration: all-n8n vs all-SDK vs hybrid
The core question: does the agent loop (receive the message, search the knowledge base, call the LLM, run tools, reply) live in n8n, live in code, or split? We evaluated three paths. This table is the heart of the decision:
| Approach | Pros | Cons |
|---|---|---|
| All in n8n visual workflows | Maximum delivery speed if the team already knows n8n. Per-execution visibility in the UI. Reinforces the standard pattern. | Multi-turn tool use turns into "Function-node spaghetti" above 3-4 tools. Hard to test and version. Maintainability ceiling with complex logic. |
| All in code service with Anthropic SDK | Native, readable tool use. Testable, typed, versionable as code. Any backend dev understands it. | More of a curve for deploy, health checks and observability. No native debug UI. Breaks the standard pattern of an n8n-first team. |
| Hybrid ✓ n8n + Node service | n8n does the plumbing; code does the reasoning. Each tool where it shines. The most reversible. | Two components to monitor instead of one. Requires documenting the split clearly. |
We chose the hybrid. n8n handles the plumbing: it receives the WhatsApp webhook and responds fast (the CRM Salesbot has a hard 2-second timeout that we mitigate with an async response), buffers the messages the customer sends one at a time, looks up the contact in the CRM and, after each turn, runs the effects on the CRM. The agent loop lives in a Node TypeScript service using @anthropic-ai/sdk.
The rule that organizes everything is simple: the service decides what happens, n8n makes it happen. Multi-turn tool use needs to chain model calls in one place, so it lives in code. The CRM change (reassign owner, create lead) is run by n8n afterwards, which leaves a natural control point for an eventual human approval. And it is the most reversible option: if the service ever gives trouble, the loop can move back to n8n; if n8n gets in the way for plumbing, it migrates to the service.
2. Knowledge base: Postgres + pgvector as the single source of truth
The agent needs a knowledge base (KB): shipping policies, FAQs, product info, support runbooks. That KB has to serve two masters: be editable by a non-technical team, and be queryable by the agent with low latency in every conversation. We evaluated Notion as the primary source, an Obsidian vault in Git, and loading the whole KB into Claude's context with no semantic search.
We landed on the simplest option: Postgres in the client's cloud as the single source of truth, with the pgvector extension for semantic retrieval. Documents live as rows in Postgres; on save we generate the embeddings (with text-embedding-3-small, 1536 dimensions); the agent runs a hybrid search (semantic with pgvector + keyword with tsvector) filtered by country.
Why not a separate vector DB (Pinecone, Qdrant)? At this scale, pgvector responds under 100 ms without adding another vendor or another sync pipeline. Why not Notion as primary? It would break the principle that everything lives in the client's infra and force us to maintain a fragile sync job. Why not load everything into context? Above a certain volume, cost explodes and you lose the record of which documents the model saw. One store, one write path, one read path. No sync between systems.
3. Everything in the client's infra from day one
This is not a technical detail, it is a guiding principle: every material asset (code, database, secrets, logs, repositories) lives in the client's infrastructure from the first commit. n8n runs as a container on a client EC2. The Node service runs on the same EC2. The database is a client Postgres RDS. The repositories are in the client's GitHub organization; Duotach is a contributor, not an owner.
Why does it matter so much that it comes first? Because it removes the black box. There is no Duotach server the client depends on forever. If tomorrow they want to operate the system with another provider, they already have it all: the code, the data and the documentation of the decisions. It is real ownership, not a marketing promise. And it conditions the other two decisions: we ruled out Notion as the primary source and frameworks with SaaS debug UIs precisely because they would pull assets out of the client's infra.
The chosen stack and why
The system is designed in layers. Each one has a clear responsibility and only talks to the one next to it.
| Layer | Technology | What it does |
|---|---|---|
| Channel | Wazzup + WhatsApp Business API | Connects the WhatsApp number and delivers messages to the CRM. |
| CRM and inbox | Kommo + Salesbot | Where contacts and pipelines live, and where humans reply. Fires the webhook on each incoming message. |
| Orchestration | n8n (Docker on client EC2) | Plumbing: webhook intake, buffering, contact lookup, side-effect execution, cron jobs. |
| Agent loop | Node TypeScript + @anthropic-ai/sdk | Reasoning: KB retrieval, LLM call, multi-turn tool use, audit log. Not exposed to the internet. |
| Data | Postgres RDS + pgvector + pg_trgm | Source of truth: documents, FAQs, prompts, routing rules, audit log and embeddings. |
| Admin | Next.js webapp | Where the client's team edits the KB, prompts and rules, and reviews the conversation history. |
On the model: we use Claude with a cost-optimized mix. Sonnet 4.5 as the main model for reasoning and conversation; Haiku 4.5 for classification and simple FAQs. The base Anthropic SDK gives native tool use, prompt caching and streaming without putting the full Claude Code harness into production, which for a conversational agent would be over-tooling.
The tools the agent can invoke in this first phase:
search_knowledge_base— searches the country-filtered KB when it needs info that did not fit in the initial context.lookup_contact_kommo— pulls the recent history and contact data from the CRM.handoff_to_human— flags the conversation to hand off to a human, with reason, urgency and intent for routing.
The roadmap adds lookup_order_status (order status), create_lead_kommo (create a lead on clear purchase intent) and escalate_with_context (handoff with a structured summary). If you are interested in the general pattern behind this, we develop it in our guide on AI agents with Claude Code.
How a conversation is resolved, step by step
The conceptual flow of a turn, end to end:
Customer (WhatsApp)
↓
Wazzup → Kommo (Salesbot fires webhook)
↓ widget_request (ack < 100 ms, 2s timeout mitigated async)
n8n: intake + buffering by chat_id + contact lookup
↓ POST /turn { chat_id, country, messages, contact_id }
Node service (agent loop):
1. Embed the query
2. Hybrid retrieval (pgvector + tsvector) by country
3. Load the country's active system prompt
4. Call Claude with tools
5. Multi-turn loop while it requests tools
6. Write the audit_log
↓ { response_text, side_effects }
n8n: reply to the customer + run side effects in the CRM
↓
Customer receives the reply on WhatsAppExample: simple question
The customer types "do you ship to Quito?". n8n acks in under 100 ms, waits a few seconds in case more messages arrive, looks up the contact and infers the country. The service embeds the query, retrieves the shipping documents, loads the country prompt and calls Claude. Claude answers directly, with no tools. The service writes the turn to the audit log and returns the text. Total latency around 8 seconds, within budget.
Example: tool use + handoff
The customer types "I've had an account problem for two days and no one replies". Claude requests lookup_contact_kommo, sees the unanswered history and decides to escalate with handoff_to_human, high urgency. The service queries the routing rules, generates an empathetic reply and returns to n8n the intent to reassign the owner. n8n sends the message and reassigns the ticket in the CRM.
Every turn is logged: which documents were retrieved, which tools were called, which prompt was used, how many tokens and how much latency. That per-turn audit log, stored in Postgres and visible from the admin, is better traceability than a workflow UI for the question that truly matters: "why did the agent answer this?". It is queryable, exportable and does not rotate away after a few days.
What comes next
Let's be honest about the status: the system is under construction. Already decided and underway are the hybrid architecture, the KB on Postgres + pgvector, the agent loop service with its first tools, and the client-infra principle.
What comes by phases: the full admin panel so the team can operate without us, prompt caching to lower costs, new tools (order status, lead creation), operational metrics and multi-country expansion.
When the agent is stable in production and we have real data (volume resolved without a human, response times, CSAT), we will publish the full case study with numbers. For now, what we can show is the reasoning. If you want to see the rest of what we build, it is in our case studies. And if you are not at this level yet but want to start with the basics, we wrote a guide on how to automate WhatsApp Business with n8n.
Let's build yours
If your operation lives on WhatsApp and the volume is already weighing on the team, a well-built AI agent (not a decision-tree bot) can absorb the repetitive work and free people for what matters. We build these with the architecture you see here: portable, auditable and operated by your team. We quote by scope.
Frequently Asked Questions
What is the difference between an AI WhatsApp agent and a chatbot?+
What stack do you use to build the AI WhatsApp agent?+
Why a hybrid of n8n + code instead of doing everything in n8n?+
Why Postgres with pgvector and not a dedicated vector DB?+
Does the agent's data stay on Duotach servers?+
How much does it cost to build an AI WhatsApp agent like this?+
Artículos Relacionados
AI Agents with Claude Code in 2026: Guide + 10 Real Examples
How to build AI agents with Claude Code: Skills, Sub-agents, Agent SDK and MCP servers, with 10 real examples in production.
Automate WhatsApp Business with n8n: complete guide
How to connect WhatsApp Business API with n8n to automate replies, route conversations and integrate your CRM.
WhatsApp chatbot with AI and ChatGPT: implementation
How to implement a WhatsApp chatbot with AI, CRM integrations and real use cases for businesses.
