June 25, 2026

•

12 min read

ByDiego Carrion·Co-founder, Duotach

Decision frameworkKnowledge baseRAGLATAM

RAG for companies: when it's worth it and when it's not

RAG for companies (Retrieval-Augmented Generation) is an architecture that connects an AI model to your company's real documents: before answering, the system searches your information for the relevant fragments and the model generates the answer based on them, citing the source instead of making things up. That's as far as almost every guide goes. None of them helps you decide whether YOUR case needs it. At Duotach we've built RAG in production and also solved cases where it would have been overkill: this article is the decision framework we use ourselves.

Book 30 minutes See AI consulting

What RAG is, in short (the only part of the pipeline you need to know)

A RAG system's pipeline has four steps. It's the technique behind most AI knowledge bases for companies:

Ingestion

Your documents (policies, manuals, contracts, processes) are split into fragments.

Indexing

Each fragment is converted into an embedding (a numerical representation of its meaning) and stored in a queryable index.

Retrieval

When someone asks a question, the system searches for the fragments most relevant to that query, not the whole corpus.

Generation

The AI model receives the question plus the retrieved fragments and writes the answer grounded in them, with a reference to the source.

If you want the long explanation of the mechanism, the AWS guide on RAG is the canonical reference. But understanding the pipeline is not making the decision. The decision is the five criteria that follow.

The 5 criteria to decide whether your company needs RAG

These are the criteria we review before proposing RAG to a client. No single one is enough on its own; the decision comes from looking at all five together.

1. Document volume

Today's models accept a lot of direct context: Anthropic's models take up to 1 million tokens per request, per their official documentation. A full procedures manual, or several, fit in a single conversation. But that same documentation flags the limit: as context grows, the model's accuracy and recall degrade (a phenomenon known as context rot). Practical rule: if your corpus is dozens of documents, direct context competes; if it's thousands of documents or grows without a ceiling, selective retrieval stops being optional.

2. Update frequency

A stable corpus (the onboarding manual that changes twice a year) can be loaded by hand whenever it changes. A living corpus (prices, stock, policies corrected every week, new meeting minutes) needs an ingestion pipeline that indexes new content automatically. RAG is designed exactly for that: you update the index, not the model. If your information changes faster than someone can copy and paste it, that's a point for RAG.

3. Source traceability

This criterion weighs more than it seems, and it's the one educational guides usually skip. If every answer needs a verifiable citation ("this comes from the purchasing policy, section 4"), RAG provides it natively: the answer arrives with the fragments that back it. For operational decisions, compliance, HR or any context where "the system told me so" isn't enough of a justification, traceability alone can justify RAG even with a small corpus.

4. Permissions and multi-user access

If everyone in the company can see every document, any architecture works. If finance can't see HR's files and an external user can't see anything internal, you need to filter what the system retrieves based on who's asking. Direct context doesn't filter by user: whatever is in the prompt is there for everyone. An index with permission metadata does. Companies of 50 employees and up almost always land on this side.

5. Cost per query

The economics are simple arithmetic: AI model APIs charge per token processed. Sending a 500,000-token corpus with every query costs over 100 times more than retrieving the 3,000-5,000 relevant tokens and sending only those. At 5 queries a day the difference is anecdotal; at hundreds of daily queries from the whole team, cost per query dominates the decision. RAG exists, in part, because nobody wants to pay for the full corpus on every question.

Decision rule: if your case checks two or more criteria on the RAG side (large or growing corpus, frequent updates, mandatory citations, role-based permissions, high query volume), RAG is justified. If it checks none, there's almost certainly a simpler, cheaper alternative.

The alternatives almost nobody tells you about

Articles written by RAG vendors rarely mention that RAG competes against simpler options. These are the three we always evaluate before proposing a build.

Direct context: the zero option

For a small, stable corpus (dozens of documents that barely change), the honest solution is to build nothing: load the documents into the model's context (a Claude project, for example) and you're done. Zero infrastructure, zero maintenance, immediate results. The academic evidence backs this option for bounded corpora: the Long Context vs RAG evaluation on arXiv shows that with enough context, giving the model the documents directly competes with retrieval and even beats it on questions that require reading the whole document. The limit: no structured traceability, no per-user permissions, and cost and degradation growing along with the corpus.

Traditional search + an agent with tools

There's a contrarian position in this debate worth taking seriously: webvise argues that most enterprise knowledge bases don't need RAG. Their argument: for a plain-text corpus of a few hundred documents (they put the threshold near 1,000), a hand-maintained index plus simple search commands is cheaper to operate and more accurate than a typical RAG pipeline, and they even concede that RAG remains the right choice for multimodal corpora, high-frequency updates and strict metadata filtering.

Our position, after building both scenarios: they're half right. The "full-text search + an agent that knows how to use search tools" pattern works very well for curated internal technical documentation. But document count is only one of the five criteria. In a real mid-sized company, traceability and permissions usually weigh more than corpus size, and there the manual index falls short well before 1,000 documents. Besides, "hand-maintained" presupposes someone maintains it, and that assumption is exactly the one that fails in most companies we know.

Fine-tuning: why we almost always rule it out

Fine-tuning (retraining the model on your data) shows up in every comparison and in practice is almost never the answer for company knowledge, for three reasons:

•The knowledge gets frozen into the model's weights. The pricing policy changed: you have to retrain. With RAG, you update a document in the index.
•There's no source citation. The model "knows" the answer but can't tell you which document it came from, so you lose traceability entirely.
•It costs more and requires more expertise than any of the other options, for a worse result in this use case.

Fine-tuning is for something else: teaching a model a style, an output format or a specific behavior. Not for facts that change.

Decision table

Criterion	Direct context	Search + agent	RAG
Volume it handles	Dozens of docs	Hundreds of docs	Thousands of docs or more
Frequent updates	Manual, doesn't scale	Requires hand maintenance	Automatic ingestion
Source traceability	Weak	Partial (returns the file)	Native (cites the fragment)
Per-user permissions	No	Limited	Yes, via metadata
Cost per query	High as the corpus grows	Low	Low (only what's relevant)
Build complexity	None	Low	Medium

The common mistakes when implementing RAG

When RAG is justified, implementation is where the project is won or lost. These are the mistakes we see most, in order of damage:

Indexing outdated or duplicated documentation

The number one mistake isn't technical: it's content. If the index holds three versions of the same policy, the system retrieves garbage and answers garbage with total confidence. Cleaning the corpus comes before any architecture decision.

Blind chunking

Splitting documents by a fixed character count, ignoring titles, sections and tables, produces fragments that don't stand on their own. Retrieval returns context-free pieces and answer quality collapses.

Not measuring retrieval quality

Almost everyone evaluates the final answer and nobody looks at which fragments the system retrieved. If retrieval brings back the wrong documents, the best model in the world answers wrong. It's measured separately.

Choosing the vector database before the problem

Starting the project by debating which vector database to use is starting at the end. First the corpus, the permissions and the real queries; the infrastructure follows from that.

Nobody owns the knowledge base

Without an owner and a defined update process, the system ages: six months later it answers with six-month-old information.

Treating it as a one-time project

A RAG system gets tuned with real queries after deploy: which questions it fails, which documents are missing, which fragments retrieve poorly. Without that tuning phase, it's left halfway.

If you want the full step-by-step build process (sources, structure, deploy, permissions), we documented it in how to build an AI knowledge base.

What a well-built RAG looks like in production

Theory gets verified with a real case. In Ecuador we built an AI knowledge base on AWS for Acatha, a company with a classic problem: documentation scattered across files, emails and people, and internal questions that depended on someone knowing where the information was.

What ended up running, and why each piece matters for this article's framework:

An agent that answers in natural language citing the internal source, instead of making things up. Traceability (criterion 3) wasn't an extra: it was the requirement that made the system useful.
Everything deployed on AWS, with the company's information in its own cloud. The data didn't leave for a third-party SaaS; the system runs on the client's infrastructure.
Claude plus embeddings and retrieval as the stack: the textbook RAG pipeline, no exotic pieces.
Phased implementation: mapping processes and documentation, automating the repetitive work, building the knowledge base, and tuning with real queries before handover. That last phase is the antidote to mistake 6.
Operated by the client's team. We documented the system so they can maintain and extend it without depending on us, which is the answer to mistake 5.

The operational result: internal questions answered 24/7 by the agent and a single source of truth for policies and processes, instead of chasing the person who knew where the file was.

From that project comes our "well-built RAG" checklist: answers with source citations, data on the client's infrastructure, a defined update process, and the client's team operating it. If a RAG proposal doesn't include those four things, half the project is missing.

The decision before the architecture

The full framework in three lines: RAG is justified when you check two or more of the five criteria (large or growing corpus, frequent updates, mandatory traceability, role-based permissions, high query volume). If you check none, direct context or search plus an agent solve the same thing for less money. And fine-tuning, for company knowledge, is almost never the answer.

Evaluating this decision for your company? The conversation is short: you tell us what documentation you have, who consults it and how often it changes, and we'll tell you honestly which architecture fits, even if the answer is "you don't need RAG".

Book a 30-min call See AI consulting

Sources

AWS — What is Retrieval-Augmented Generation (RAG)Anthropic — Context windows (window sizes and context rot)webvise — Most enterprise knowledge bases don't need RAG arXiv 2501.01880 — Long Context vs. RAG for LLMs: An Evaluation and Revisits

Frequently Asked Questions

What is RAG for companies?+

RAG (retrieval-augmented generation) is an architecture that connects an AI model to your company's documents: for every question, the system retrieves the relevant fragments from your real information and the model answers based on them, citing the source. It's the technical foundation of AI knowledge bases.

When should you NOT use RAG?+

When the corpus is small (dozens of documents), stable, has no per-user permissions and doesn't require verifiable citations. In that scenario, loading the documents directly into the model's context or using traditional search with an agent solves the same problem with zero infrastructure and lower cost.

Does RAG eliminate hallucinations?+

It doesn't eliminate them: it reduces them and makes them auditable. By forcing the model to answer over fragments retrieved from real documents and cite the source, every answer can be verified against the original. But if the index contains outdated information or retrieval fails, the system answers wrong with full confidence.

What's the difference between RAG and fine-tuning?+

RAG looks up the information in your documents at query time; fine-tuning retrains the model to bake it into its weights. For knowledge that changes, RAG wins: it updates by editing the index and keeps source citations. Fine-tuning is for style and format, not for facts.

How much does it cost to implement RAG in a company?+

It depends on scope: number and type of sources, corpus volume, role-based permissions and the infrastructure it runs on. That's why we quote by scope, not with a flat rate. What is comparable: RAG costs more to build than a direct-context setup, and far less per query at real usage volume.

How long does a RAG implementation take?+

It's implemented in phases, not all at once: mapping sources and documentation, building the indexing and retrieval pipeline, and a tuning phase with the team's real queries before handover. Total timeline depends on the state of the corpus: cleaning the documentation usually takes longer than the pipeline.

Seguir leyendo