How to build an AI knowledge base: the process behind a real build
Building an AI knowledge base is the process of centralizing your company's documentation and connecting it to a language model, so anyone on the team can ask in natural language and get the answer from the real information, with the source cited. The process has 6 steps: map where your knowledge lives today, decide the architecture, prepare the content, deploy on your infrastructure, test with real queries and hand operations over to your team. Almost every guide on this topic is a vendor tutorial on configuring their platform; this one walks through the process as it happens in a real build project, with a production case as its backbone.
What an AI knowledge base is (and what changes compared to a wiki)
An AI knowledge base is a system where your company's documentation (processes, policies, manuals, contracts) is centralized and connected to a language model that answers questions using those documents as its source. The difference with a wiki or a tidy Drive is the access model: a wiki is something you search, browsing folders and guessing which document holds the answer; an AI knowledge base is something you ask, the way you'd ask a colleague, and it answers citing which document the answer came from.
Of an interaction worker's workweek goes to searching for internal information or tracking down the colleague who can help, according to the McKinsey Global Institute.
Is how much that search time can drop when the company's knowledge lives in a searchable record, according to the same study.
In practice, what we see in mid-sized companies is simpler to describe: internal questions depend on someone knowing where the right document is, and when that person is out, the operation waits. If you're still evaluating whether your company needs one, our guide on AI knowledge bases for companies covers the what and the why. This piece assumes you've already decided to move forward and focuses on the how.
Before building anything: the 2 decisions that define the project
The most expensive mistake isn't made while loading documents, it's made before: starting without deciding where the system will live or how it will retrieve information. These two decisions define the cost, the privacy and the useful life of the project.
Decision 1: SaaS software or a build on your own infrastructure?
| Criterion | Knowledge base SaaS | Build on your infrastructure |
|---|---|---|
| Where your data lives | On the vendor's servers | On your own cloud (AWS, GCP, Azure) |
| Fit to your processes | Whatever the product allows | Total: designed around your sources and workflows |
| Cost | Monthly subscription per user, forever | One-time project + limited maintenance |
| Dependency | On the vendor's roadmap and pricing | On your own team (it's left documented) |
| Integration with your systems | Standard connectors | Whatever your operation needs |
SaaS makes sense when the use case is standard customer support and there are no restrictions on where the information lives. A build on your own infrastructure makes sense when the documentation is sensitive (operations, legal, finance), when you already have cloud contracts and an IT team, or when the system has to integrate with how your company works and not the other way around. Our work sits in the second scenario: in the Ecuador case, everything was deployed on AWS, with the company's information on its own cloud.
Decision 2: RAG or not?
RAG (retrieval-augmented generation) is the technique where the system first retrieves the relevant fragments of your documents and then the model writes the answer using only those fragments. It's what makes the agent answer with your real information instead of making things up, and what lets it cite the source.
Not every project needs it: with a small volume of stable documents, passing the model direct context is enough and simpler to maintain. The short rule: RAG is justified when your documentation volume exceeds what a model can read per query, when the content changes often, or when you need traceability of which document backed each answer. The full decision framework is in RAG for companies: when it's worth it and when it's not.
How to build an AI knowledge base in 6 steps
These steps are the phases of a real build project, in the order we execute them. They're not software features: they're work someone has to do, with or without a consultancy.
Map where your knowledge lives today
The first step is a map, not a tool. Before choosing technology you have to answer: which documents exist, where they are (Drive, emails, Excel files, PDFs, two people's heads), which are current and who owns each topic. In the Ecuador project, the starting point was exactly that: documentation scattered across files, emails and people, and internal questions that depended on someone knowing where the information was.
This mapping produces the project's real scope. The rule we use: what the team asks about most often goes in first, not what happens to be tidiest.
Decide the architecture
With the source map in hand, you decide how it gets built: RAG or direct context, which cloud it runs on, which model answers and where the team queries from (an internal chat, WhatsApp, an existing app). This decision follows from the volume and sensitivity of the sources you mapped, not the other way around.
The reference architecture we run in production: documents on the client's cloud, an embeddings index for semantic search, retrieval of the relevant fragments per query, and Claude writing the answer with the obligation to cite the internal source. It's the architecture of the Ecuador case: AWS, a knowledge base with RAG, embeddings plus retrieval, and Claude as the conversational layer.
Prepare the content
This is where answer quality gets decided. The agent answers as well as the worst version of the document you gave it: if three versions of the same policy coexist, the system will answer with one of the three. Three concrete tasks:
- Clean up: remove duplicates and old versions; one current version per topic.
- Fill in: knowledge that lives in someone's head gets written down now, or the system is born incomplete.
- Chunk with metadata: split documents into retrievable fragments and tag them (topic, area, validity).
Deploy on the client's infrastructure
Only here does the code show up. The index gets deployed, the model connected, the query interface built and access configured. The point that separates a serious build from a demo: everything runs on the company's cloud, with its information inside its own perimeter. In Ecuador that was AWS; the criterion is the same on whichever cloud the client already has under contract.
This phase also sets the agent's behavior: answer only with what's in the base, cite the source of every answer, and say "I don't have that information" when the document doesn't exist. An agent that confidently makes up answers is worse than no agent at all.
Test with real queries from real users
None of these systems comes out right on the first try. The testing phase means putting the agent in front of real team members and tuning it with their queries: in the Ecuador project this was its own phase of the plan, tuning with real queries before handover. What gets adjusted, in order of frequency:
- Retrieval: the question was fine but the system pulled the wrong fragment; the chunking or the metadata gets adjusted.
- Content gaps: people ask things no document answers; back to step 3.
- Tone and format: answers that run too long, or don't cite the source the way the team needs.
Hand operations over to the team
A knowledge base only the consultancy knows how to operate is a future problem. The project closes with documentation and handover: how to add a document, how to correct a bad answer, how to onboard a user. In the Ecuador case, the system was documented so the team can maintain and extend it without depending on us. That's the standard: the system belongs to the client, runs on their cloud and their team operates it.
The process in a real case: the Acatha (Ecuador) knowledge base
Acatha is an operations company in Ecuador that arrived with the typical problem: manual operational processes and documentation scattered across files and people. Internal questions depended on someone finding the right document, and there was no single source of truth for policies and processes.
The project followed the phases in this guide, in a staged implementation: mapping the processes and the documentation to centralize, automating the priority repetitive processes, building the AI knowledge base on AWS, and tuning with real queries plus handover to the team. The result in production:
Internal questions get answered in natural language, at any hour.
They come from the company's real documents, citing the internal source instead of making things up.
A single source for policies and processes, instead of scattered versions.
Everything deployed on AWS, with the company's information inside its perimeter, and the team maintains it without depending on the consultancy.
What keeps a knowledge base alive (the part no tutorial covers)
Most guides end at "keep the content updated" and that's exactly where projects die. A knowledge base degrades on its own: processes change, policies get updated, the people who knew leave. Three concrete practices keep it alive:
An owner with a name
Not "the team": one person responsible for what enters the base being current. They don't need to be technical; they need to know who to ask about each topic.
Updates tied to the process, not the calendar
When a policy changes, updating the base is part of the change, not a quarterly task that keeps getting postponed.
Unanswered questions as a signal
The queries where the agent answered "I don't have that information" are next month's list of what to document.
There's a single alarm signal: the team goes back to asking over chat what the base should be answering. When that happens, the problem is almost never the model; it's stale or incomplete content.
If you're interested in the concept behind this, the company's knowledge operating as a system that answers on its own, we develop it in the company brain with AI.
We build your knowledge base on your infrastructure
At Duotach we build AI knowledge bases for companies across LATAM and Spain, on the client's cloud and with the client's team operating them by the end of the project. We use Claude in production every day; the Ecuador case in this guide is in production, not on a slide. If your documentation is scattered and you want to see what the process would look like at your company, book a 30-minute call: we listen to your context, tell you whether your case calls for RAG or something simpler, and send you a proposal quoted by scope.
Frequently Asked Questions
What is an AI knowledge base?+
What do I need to build an AI knowledge base?+
How much does it cost to build an AI knowledge base?+
Should I use SaaS software or a custom build?+
What is RAG and do I need it?+
How long does it take to implement an AI knowledge base?+
Artículos Relacionados
AI knowledge bases for companies: what they are and what they solve
What an AI knowledge base is, which problems it solves and how to decide whether your company needs one.
RAG for companies: when it makes sense and when it does not
A decision framework to know whether your knowledge base needs RAG or something simpler is enough.
The company brain with AI
What happens when company knowledge stops living in folders and starts answering on its own.
