RAG for companies: when it's worth it and when it's not
RAG for companies (Retrieval-Augmented Generation) is an architecture that connects an AI model to your company's real documents: before answering, the system searches your information for the relevant fragments and the model generates the answer based on them, citing the source instead of making things up. That's as far as almost every guide goes. None of them helps you decide whether YOUR case needs it. At Duotach we've built RAG in production and also solved cases where it would have been overkill: this article is the decision framework we use ourselves.
What RAG is, in short (the only part of the pipeline you need to know)
A RAG system's pipeline has four steps. It's the technique behind most AI knowledge bases for companies:
Ingestion
Your documents (policies, manuals, contracts, processes) are split into fragments.
Indexing
Each fragment is converted into an embedding (a numerical representation of its meaning) and stored in a queryable index.
Retrieval
When someone asks a question, the system searches for the fragments most relevant to that query, not the whole corpus.
Generation
The AI model receives the question plus the retrieved fragments and writes the answer grounded in them, with a reference to the source.
If you want the long explanation of the mechanism, the AWS guide on RAG is the canonical reference. But understanding the pipeline is not making the decision. The decision is the five criteria that follow.
The 5 criteria to decide whether your company needs RAG
These are the criteria we review before proposing RAG to a client. No single one is enough on its own; the decision comes from looking at all five together.
1. Document volume
Today's models accept a lot of direct context: Anthropic's models take up to 1 million tokens per request, per their official documentation. A full procedures manual, or several, fit in a single conversation. But that same documentation flags the limit: as context grows, the model's accuracy and recall degrade (a phenomenon known as context rot). Practical rule: if your corpus is dozens of documents, direct context competes; if it's thousands of documents or grows without a ceiling, selective retrieval stops being optional.
2. Update frequency
A stable corpus (the onboarding manual that changes twice a year) can be loaded by hand whenever it changes. A living corpus (prices, stock, policies corrected every week, new meeting minutes) needs an ingestion pipeline that indexes new content automatically. RAG is designed exactly for that: you update the index, not the model. If your information changes faster than someone can copy and paste it, that's a point for RAG.
3. Source traceability
This criterion weighs more than it seems, and it's the one educational guides usually skip. If every answer needs a verifiable citation ("this comes from the purchasing policy, section 4"), RAG provides it natively: the answer arrives with the fragments that back it. For operational decisions, compliance, HR or any context where "the system told me so" isn't enough of a justification, traceability alone can justify RAG even with a small corpus.
4. Permissions and multi-user access
If everyone in the company can see every document, any architecture works. If finance can't see HR's files and an external user can't see anything internal, you need to filter what the system retrieves based on who's asking. Direct context doesn't filter by user: whatever is in the prompt is there for everyone. An index with permission metadata does. Companies of 50 employees and up almost always land on this side.
5. Cost per query
The economics are simple arithmetic: AI model APIs charge per token processed. Sending a 500,000-token corpus with every query costs over 100 times more than retrieving the 3,000-5,000 relevant tokens and sending only those. At 5 queries a day the difference is anecdotal; at hundreds of daily queries from the whole team, cost per query dominates the decision. RAG exists, in part, because nobody wants to pay for the full corpus on every question.
Decision rule: if your case checks two or more criteria on the RAG side (large or growing corpus, frequent updates, mandatory citations, role-based permissions, high query volume), RAG is justified. If it checks none, there's almost certainly a simpler, cheaper alternative.
The alternatives almost nobody tells you about
Articles written by RAG vendors rarely mention that RAG competes against simpler options. These are the three we always evaluate before proposing a build.
Direct context: the zero option
For a small, stable corpus (dozens of documents that barely change), the honest solution is to build nothing: load the documents into the model's context (a Claude project, for example) and you're done. Zero infrastructure, zero maintenance, immediate results. The academic evidence backs this option for bounded corpora: the Long Context vs RAG evaluation on arXiv shows that with enough context, giving the model the documents directly competes with retrieval and even beats it on questions that require reading the whole document. The limit: no structured traceability, no per-user permissions, and cost and degradation growing along with the corpus.
Traditional search + an agent with tools
There's a contrarian position in this debate worth taking seriously: webvise argues that most enterprise knowledge bases don't need RAG. Their argument: for a plain-text corpus of a few hundred documents (they put the threshold near 1,000), a hand-maintained index plus simple search commands is cheaper to operate and more accurate than a typical RAG pipeline, and they even concede that RAG remains the right choice for multimodal corpora, high-frequency updates and strict metadata filtering.
Our position, after building both scenarios: they're half right. The "full-text search + an agent that knows how to use search tools" pattern works very well for curated internal technical documentation. But document count is only one of the five criteria. In a real mid-sized company, traceability and permissions usually weigh more than corpus size, and there the manual index falls short well before 1,000 documents. Besides, "hand-maintained" presupposes someone maintains it, and that assumption is exactly the one that fails in most companies we know.
Fine-tuning: why we almost always rule it out
Fine-tuning (retraining the model on your data) shows up in every comparison and in practice is almost never the answer for company knowledge, for three reasons:
- •The knowledge gets frozen into the model's weights. The pricing policy changed: you have to retrain. With RAG, you update a document in the index.
- •There's no source citation. The model "knows" the answer but can't tell you which document it came from, so you lose traceability entirely.
- •It costs more and requires more expertise than any of the other options, for a worse result in this use case.
Fine-tuning is for something else: teaching a model a style, an output format or a specific behavior. Not for facts that change.
Decision table
| Criterion | Direct context | Search + agent | RAG |
|---|---|---|---|
| Volume it handles | Dozens of docs | Hundreds of docs | Thousands of docs or more |
| Frequent updates | Manual, doesn't scale | Requires hand maintenance | Automatic ingestion |
| Source traceability | Weak | Partial (returns the file) | Native (cites the fragment) |
| Per-user permissions | No | Limited | Yes, via metadata |
| Cost per query | High as the corpus grows | Low | Low (only what's relevant) |
| Build complexity | None | Low | Medium |
The common mistakes when implementing RAG
When RAG is justified, implementation is where the project is won or lost. These are the mistakes we see most, in order of damage:
Indexing outdated or duplicated documentation
The number one mistake isn't technical: it's content. If the index holds three versions of the same policy, the system retrieves garbage and answers garbage with total confidence. Cleaning the corpus comes before any architecture decision.
Blind chunking
Splitting documents by a fixed character count, ignoring titles, sections and tables, produces fragments that don't stand on their own. Retrieval returns context-free pieces and answer quality collapses.
Not measuring retrieval quality
Almost everyone evaluates the final answer and nobody looks at which fragments the system retrieved. If retrieval brings back the wrong documents, the best model in the world answers wrong. It's measured separately.
Choosing the vector database before the problem
Starting the project by debating which vector database to use is starting at the end. First the corpus, the permissions and the real queries; the infrastructure follows from that.
Nobody owns the knowledge base
Without an owner and a defined update process, the system ages: six months later it answers with six-month-old information.
Treating it as a one-time project
A RAG system gets tuned with real queries after deploy: which questions it fails, which documents are missing, which fragments retrieve poorly. Without that tuning phase, it's left halfway.
If you want the full step-by-step build process (sources, structure, deploy, permissions), we documented it in how to build an AI knowledge base.
What a well-built RAG looks like in production
Theory gets verified with a real case. In Ecuador we built an AI knowledge base on AWS for Acatha, a company with a classic problem: documentation scattered across files, emails and people, and internal questions that depended on someone knowing where the information was.
What ended up running, and why each piece matters for this article's framework:
- An agent that answers in natural language citing the internal source, instead of making things up. Traceability (criterion 3) wasn't an extra: it was the requirement that made the system useful.
- Everything deployed on AWS, with the company's information in its own cloud. The data didn't leave for a third-party SaaS; the system runs on the client's infrastructure.
- Claude plus embeddings and retrieval as the stack: the textbook RAG pipeline, no exotic pieces.
- Phased implementation: mapping processes and documentation, automating the repetitive work, building the knowledge base, and tuning with real queries before handover. That last phase is the antidote to mistake 6.
- Operated by the client's team. We documented the system so they can maintain and extend it without depending on us, which is the answer to mistake 5.
The operational result: internal questions answered 24/7 by the agent and a single source of truth for policies and processes, instead of chasing the person who knew where the file was.
From that project comes our "well-built RAG" checklist: answers with source citations, data on the client's infrastructure, a defined update process, and the client's team operating it. If a RAG proposal doesn't include those four things, half the project is missing.
The decision before the architecture
The full framework in three lines: RAG is justified when you check two or more of the five criteria (large or growing corpus, frequent updates, mandatory traceability, role-based permissions, high query volume). If you check none, direct context or search plus an agent solve the same thing for less money. And fine-tuning, for company knowledge, is almost never the answer.
Evaluating this decision for your company? The conversation is short: you tell us what documentation you have, who consults it and how often it changes, and we'll tell you honestly which architecture fits, even if the answer is "you don't need RAG".
Frequently Asked Questions
What is RAG for companies?+
When should you NOT use RAG?+
Does RAG eliminate hallucinations?+
What's the difference between RAG and fine-tuning?+
How much does it cost to implement RAG in a company?+
How long does a RAG implementation take?+
Artículos Relacionados
AI knowledge base for companies: what it is and how to get yours
What an AI knowledge base is, when to buy a SaaS and when to build it on your own infrastructure. With a real case on AWS in Ecuador.
How to build an AI knowledge base: a real process
How to build an AI knowledge base in 6 steps: sources, RAG architecture, deployment on your cloud, testing and maintenance. With a real case.
AI agents with Claude Code: what they are and how they are used
How Claude Code agents and subagents work, real use cases and how to apply them in enterprise automation.
