12 min read
ByDiego Carrion·Co-founder, Duotach
How-to GuideKnowledge BaseRAGLATAM

How to build an AI knowledge base: the process behind a real build

Building an AI knowledge base is the process of centralizing your company's documentation and connecting it to a language model, so anyone on the team can ask in natural language and get the answer from the real information, with the source cited. The process has 6 steps: map where your knowledge lives today, decide the architecture, prepare the content, deploy on your infrastructure, test with real queries and hand operations over to your team. Almost every guide on this topic is a vendor tutorial on configuring their platform; this one walks through the process as it happens in a real build project, with a production case as its backbone.

What an AI knowledge base is (and what changes compared to a wiki)

An AI knowledge base is a system where your company's documentation (processes, policies, manuals, contracts) is centralized and connected to a language model that answers questions using those documents as its source. The difference with a wiki or a tidy Drive is the access model: a wiki is something you search, browsing folders and guessing which document holds the answer; an AI knowledge base is something you ask, the way you'd ask a colleague, and it answers citing which document the answer came from.

~20%

Of an interaction worker's workweek goes to searching for internal information or tracking down the colleague who can help, according to the McKinsey Global Institute.

-35%

Is how much that search time can drop when the company's knowledge lives in a searchable record, according to the same study.

In practice, what we see in mid-sized companies is simpler to describe: internal questions depend on someone knowing where the right document is, and when that person is out, the operation waits. If you're still evaluating whether your company needs one, our guide on AI knowledge bases for companies covers the what and the why. This piece assumes you've already decided to move forward and focuses on the how.

Before building anything: the 2 decisions that define the project

The most expensive mistake isn't made while loading documents, it's made before: starting without deciding where the system will live or how it will retrieve information. These two decisions define the cost, the privacy and the useful life of the project.

Decision 1: SaaS software or a build on your own infrastructure?

CriterionKnowledge base SaaSBuild on your infrastructure
Where your data livesOn the vendor's serversOn your own cloud (AWS, GCP, Azure)
Fit to your processesWhatever the product allowsTotal: designed around your sources and workflows
CostMonthly subscription per user, foreverOne-time project + limited maintenance
DependencyOn the vendor's roadmap and pricingOn your own team (it's left documented)
Integration with your systemsStandard connectorsWhatever your operation needs

SaaS makes sense when the use case is standard customer support and there are no restrictions on where the information lives. A build on your own infrastructure makes sense when the documentation is sensitive (operations, legal, finance), when you already have cloud contracts and an IT team, or when the system has to integrate with how your company works and not the other way around. Our work sits in the second scenario: in the Ecuador case, everything was deployed on AWS, with the company's information on its own cloud.

Decision 2: RAG or not?

RAG (retrieval-augmented generation) is the technique where the system first retrieves the relevant fragments of your documents and then the model writes the answer using only those fragments. It's what makes the agent answer with your real information instead of making things up, and what lets it cite the source.

Not every project needs it: with a small volume of stable documents, passing the model direct context is enough and simpler to maintain. The short rule: RAG is justified when your documentation volume exceeds what a model can read per query, when the content changes often, or when you need traceability of which document backed each answer. The full decision framework is in RAG for companies: when it's worth it and when it's not.

How to build an AI knowledge base in 6 steps

These steps are the phases of a real build project, in the order we execute them. They're not software features: they're work someone has to do, with or without a consultancy.

1

Map where your knowledge lives today

The first step is a map, not a tool. Before choosing technology you have to answer: which documents exist, where they are (Drive, emails, Excel files, PDFs, two people's heads), which are current and who owns each topic. In the Ecuador project, the starting point was exactly that: documentation scattered across files, emails and people, and internal questions that depended on someone knowing where the information was.

This mapping produces the project's real scope. The rule we use: what the team asks about most often goes in first, not what happens to be tidiest.

2

Decide the architecture

With the source map in hand, you decide how it gets built: RAG or direct context, which cloud it runs on, which model answers and where the team queries from (an internal chat, WhatsApp, an existing app). This decision follows from the volume and sensitivity of the sources you mapped, not the other way around.

The reference architecture we run in production: documents on the client's cloud, an embeddings index for semantic search, retrieval of the relevant fragments per query, and Claude writing the answer with the obligation to cite the internal source. It's the architecture of the Ecuador case: AWS, a knowledge base with RAG, embeddings plus retrieval, and Claude as the conversational layer.

3

Prepare the content

This is where answer quality gets decided. The agent answers as well as the worst version of the document you gave it: if three versions of the same policy coexist, the system will answer with one of the three. Three concrete tasks:

  • Clean up: remove duplicates and old versions; one current version per topic.
  • Fill in: knowledge that lives in someone's head gets written down now, or the system is born incomplete.
  • Chunk with metadata: split documents into retrievable fragments and tag them (topic, area, validity).
4

Deploy on the client's infrastructure

Only here does the code show up. The index gets deployed, the model connected, the query interface built and access configured. The point that separates a serious build from a demo: everything runs on the company's cloud, with its information inside its own perimeter. In Ecuador that was AWS; the criterion is the same on whichever cloud the client already has under contract.

This phase also sets the agent's behavior: answer only with what's in the base, cite the source of every answer, and say "I don't have that information" when the document doesn't exist. An agent that confidently makes up answers is worse than no agent at all.

5

Test with real queries from real users

None of these systems comes out right on the first try. The testing phase means putting the agent in front of real team members and tuning it with their queries: in the Ecuador project this was its own phase of the plan, tuning with real queries before handover. What gets adjusted, in order of frequency:

  • Retrieval: the question was fine but the system pulled the wrong fragment; the chunking or the metadata gets adjusted.
  • Content gaps: people ask things no document answers; back to step 3.
  • Tone and format: answers that run too long, or don't cite the source the way the team needs.
6

Hand operations over to the team

A knowledge base only the consultancy knows how to operate is a future problem. The project closes with documentation and handover: how to add a document, how to correct a bad answer, how to onboard a user. In the Ecuador case, the system was documented so the team can maintain and extend it without depending on us. That's the standard: the system belongs to the client, runs on their cloud and their team operates it.

The process in a real case: the Acatha (Ecuador) knowledge base

Acatha is an operations company in Ecuador that arrived with the typical problem: manual operational processes and documentation scattered across files and people. Internal questions depended on someone finding the right document, and there was no single source of truth for policies and processes.

The project followed the phases in this guide, in a staged implementation: mapping the processes and the documentation to centralize, automating the priority repetitive processes, building the AI knowledge base on AWS, and tuning with real queries plus handover to the team. The result in production:

24/7 agent

Internal questions get answered in natural language, at any hour.

Answers with RAG

They come from the company's real documents, citing the internal source instead of making things up.

One source of truth

A single source for policies and processes, instead of scattered versions.

On their own cloud (AWS)

Everything deployed on AWS, with the company's information inside its perimeter, and the team maintains it without depending on the consultancy.

See the full case study

What keeps a knowledge base alive (the part no tutorial covers)

Most guides end at "keep the content updated" and that's exactly where projects die. A knowledge base degrades on its own: processes change, policies get updated, the people who knew leave. Three concrete practices keep it alive:

1.

An owner with a name

Not "the team": one person responsible for what enters the base being current. They don't need to be technical; they need to know who to ask about each topic.

2.

Updates tied to the process, not the calendar

When a policy changes, updating the base is part of the change, not a quarterly task that keeps getting postponed.

3.

Unanswered questions as a signal

The queries where the agent answered "I don't have that information" are next month's list of what to document.

There's a single alarm signal: the team goes back to asking over chat what the base should be answering. When that happens, the problem is almost never the model; it's stale or incomplete content.

If you're interested in the concept behind this, the company's knowledge operating as a system that answers on its own, we develop it in the company brain with AI.

We build your knowledge base on your infrastructure

At Duotach we build AI knowledge bases for companies across LATAM and Spain, on the client's cloud and with the client's team operating them by the end of the project. We use Claude in production every day; the Ecuador case in this guide is in production, not on a slide. If your documentation is scattered and you want to see what the process would look like at your company, book a 30-minute call: we listen to your context, tell you whether your case calls for RAG or something simpler, and send you a proposal quoted by scope.

Frequently Asked Questions

What is an AI knowledge base?+
It's a system that centralizes a company's documentation and connects it to a language model, so the team can ask questions in natural language and get answers based on the real documents, with the source cited. Unlike a wiki, you don't browse it: you ask it.
What do I need to build an AI knowledge base?+
Three things: your knowledge sources mapped out (current documents, an owner per topic), an architecture decision (RAG or direct context, which cloud it runs on) and one person on your team who owns the content after launch. The technology is the most solved part of the problem.
How much does it cost to build an AI knowledge base?+
It depends on scope: how many sources need to be integrated, whether it requires RAG, which infrastructure it runs on and which channel the team uses to query it. A SaaS is paid per user per month indefinitely; a custom build is quoted by scope as a project, plus limited maintenance. At Duotach we quote by scope after mapping the sources.
Should I use SaaS software or a custom build?+
SaaS if the use case is standard (typical customer support) and it doesn't matter where your data lives. Custom if your documentation is sensitive, if you already have your own cloud and IT team, or if the system has to adapt to your processes rather than the other way around.
What is RAG and do I need it?+
RAG (retrieval-augmented generation) makes the system first retrieve the relevant fragments of your documents so the model answers using only those, citing the source. You need it when your document volume is large, changes often, or you need traceability. With a few stable documents, direct context is enough.
How long does it take to implement an AI knowledge base?+
It's implemented in phases: mapping, architecture, content preparation, build, testing with real users and handover. The total timeline depends on the volume and state of your documentation, which is the variable that takes the most work; the infrastructure and the model are the fastest part.