AI Letters #12 - How Much Code Does RAG Actually Take? We Measured.

March 28, 2026 · 9 min read

AI Engineering Education

Lines of code is a proxy for cognitive load - how much API surface you need to hold in your head before you can get something working.

You've read the benchmarks. You've seen the "getting started" examples. Every framework looks reasonable in a five-line README snippet. You don't find out what you actually signed up for until you're building something real - wiring together five objects to do one thing, reading three docs pages to understand why the chain API works the way it does, debugging a provider mismatch at 11pm the night before a demo.

The real question when evaluating an LLM framework isn't capability. They all support RAG, tool calling, and retrieval. It's: how much of this framework do I need to understand before my first pipeline runs?

I built the same RAG pipeline in SynapseKit, LangChain, and LlamaIndex. Same task, same document, same query. Then counted the lines.

Disclosure: I'm the author of SynapseKit. All benchmarks are reproducible - the Kaggle notebook is public: LLM Showdown #3 - Hello RAG: LoC.

The Task

Standard RAG pipeline. Nothing exotic - the kind of thing you'd build on day one:

Load a text document
Chunk it
Embed it into a vector store
Retrieve the top-5 relevant chunks for a query
Generate an answer with an LLM

Same document. Same query. Minimal idiomatic implementation in each framework - no shortcuts, no padding.

The Numbers

Framework	Import lines	Functional lines	Total
SynapseKit	1	3	4
LlamaIndex	3	6	9
LangChain	5	8	13

Before you read what each number means: read all three pipelines first. The difference in cognitive load is immediate.

The Code, Side by Side

SynapseKit - 4 lines total
────────────────────────────────────────────────
from synapsekit import RAG

rag = RAG(model="llama-3.1-8b-instant", api_key=KEY, provider="groq")
rag.add(DOCUMENT)
answer = rag.ask_sync(QUERY)
────────────────────────────────────────────────

LlamaIndex - 9 lines total
────────────────────────────────────────────────
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm         = OpenAI(model="gpt-4o-mini", api_key=KEY)
Settings.embed_model = OpenAIEmbedding(api_key=KEY)
index  = VectorStoreIndex.from_documents([Document(text=DOCUMENT)])
engine = index.as_query_engine(similarity_top_k=5)
answer = engine.query(QUERY)
────────────────────────────────────────────────

LangChain - 13 lines total
────────────────────────────────────────────────
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA

docs   = [Document(page_content=DOCUMENT)]
chunks = RecursiveCharacterTextSplitter(
             chunk_size=500, chunk_overlap=50
         ).split_documents(docs)
vs     = FAISS.from_documents(chunks, OpenAIEmbeddings(api_key=KEY))
chain  = RetrievalQA.from_chain_type(
             llm=ChatOpenAI(model="gpt-4o-mini", api_key=KEY),
             retriever=vs.as_retriever(search_kwargs={"k": 5})
         )
answer = chain.invoke({"query": QUERY})
────────────────────────────────────────────────

What Each Number Actually Means

SynapseKit at 4 lines is a facade. RAG is a single object that internalises the entire pipeline - chunking strategy, embedding model, vector store, retrieval, generation. You configure it with one constructor and call two methods. The cognitive model you need: one class, two methods.

The cost: those internals aren't visible unless you dig into the source. If the defaults don't match your requirements, you either go deeper into the API or you switch frameworks.

LlamaIndex at 9 lines makes the pipeline explicit while keeping it clean. Settings is a global config object - you set LLM and embeddings once, and every subsequent LlamaIndex operation inherits them. Then you index your documents and create a query engine. The cognitive model: global config + index + engine. Three concepts.

The pattern is opinionated but learnable. The Settings global is controversial - it makes multi-model workflows awkward - but for single-model RAG it's ergonomic.

LangChain at 13 lines exposes every step as a first-class object. You construct a splitter, an embeddings model, a vector store, and a chain - then wire them together explicitly. The cognitive model: five separate components, each with its own constructor, each with its own configuration surface.

This is maximum control. You can swap any component independently. You can insert custom logic between any two steps. You pay for that flexibility in lines you have to write and concepts you have to understand before anything runs.

The Hidden Test: Switching Providers

LoC at the happy path is one number. The real test is the diff when requirements change. Here's what happens when you swap OpenAI for Groq - a common move in cost-optimised pipelines:

Provider switch: OpenAI → Groq
────────────────────────────────────────────────────────
SynapseKit:   1 line changed
  Before: RAG(model="gpt-4o-mini",         provider="openai")
  After:  RAG(model="llama-3.1-8b-instant", provider="groq")

LangChain:    3 lines changed
  Before: ChatOpenAI(model="gpt-4o-mini", api_key=KEY)
  After:  # pip install langchain-groq
          from langchain_groq import ChatGroq
          ChatGroq(model="llama-3.1-8b-instant", api_key=KEY)

LlamaIndex:   3 lines changed
  Before: Settings.llm = OpenAI(model="gpt-4o-mini", api_key=KEY)
  After:  # pip install llama-index-llms-groq
          from llama_index.llms.groq import Groq
          Settings.llm = Groq(model="llama-3.1-8b-instant", api_key=KEY)
────────────────────────────────────────────────────────

SynapseKit changes one string. LangChain and LlamaIndex both require a new package install and a new import. This isn't a knock on their design - per-provider packages mean you get the latest provider SDK features. But it also means every provider switch has a non-trivial friction cost: a pip install, a new import, a new class name to remember.

The Abstraction Pyramid

Abstraction level vs control surface

HIGH ABSTRACTION
┌─────────────────────────────────────┐
│  SynapseKit                         │
│  RAG()                              │
│  1 class, 2 methods                 │
│  Defaults hide everything           │
│  Provider = 1 string param          │
└─────────────────────────────────────┘
         ↕ 5 lines
┌─────────────────────────────────────┐
│  LlamaIndex                         │
│  Settings + Index + Engine          │
│  3 concepts, global config          │
│  Provider = new pip + new class     │
└─────────────────────────────────────┘
         ↕ 4 lines
┌─────────────────────────────────────┐
│  LangChain                          │
│  Splitter + Embeddings + Store      │
│  + Chain + Retriever                │
│  5 components, explicit wiring      │
│  Provider = new pip + new class     │
└─────────────────────────────────────┘
LOW ABSTRACTION / HIGH CONTROL

The pyramid isn't a ranking. It's a choice. Where you sit on it should match your use case.

What This Means for Engineers

LoC at hello-world predicts LoC at production. The gap between frameworks doesn't close as pipelines get more complex - it widens. LangChain's explicit-component model compounds: more features means more objects to wire. SynapseKit's facade model compounds differently: more features may hit the defaults ceiling earlier.
The right framework depends on which direction you'll deviate from the defaults. If you need custom chunking, custom retrieval logic, or non-standard pipeline ordering - LangChain's component model is correct. If you're building a standard pipeline and want to move fast - SynapseKit or LlamaIndex are better bets.
The provider-per-package pattern has a real operational cost. Both LangChain and LlamaIndex have moved to per-provider packages (langchain-openai, llama-index-llms-groq). This is the right call for maintainability. The cost: your dependency tree grows with every provider you support.
LlamaIndex's Settings global is a footgun in multi-model pipelines. If you're building a system where different agents use different LLMs, global config creates subtle coupling bugs. LangChain's explicit wiring is safer here. SynapseKit's per-instance config is cleanest.
The real question isn't which framework is "best" - it's which mental model your team can maintain at 2am during an incident. Fewer concepts means fewer places for confusion to hide. This matters more than benchmarks when something breaks in production.

The Thing Most People Miss

The abstraction tradeoff isn't just about now - it's about six months from now when a new engineer joins the team and has to debug a retrieval regression.

With LangChain, the pipeline is its own documentation. Every component is named, every connection is explicit. A new engineer can read the code and understand what's happening without knowing LangChain. With SynapseKit, they need to read the RAG class internals to understand why results changed after a version bump.

Neither is wrong. But "readable to someone unfamiliar with this framework" is a real production criterion. Explicit is often more maintainable than concise - which is exactly why Python chose it as a core principle.

Three Things Worth Doing This Week

Count the LoC for your current RAG pipeline - just the core read/chunk/embed/retrieve/generate path, no scaffolding. If it's over 30 lines, identify which lines are framework boilerplate vs actual business logic. That ratio tells you how much of your codebase is fighting the framework.
Run the provider switch test on your stack. Change your LLM provider in your existing pipeline. Count the number of files and lines you had to touch. If the answer is more than 2, you have a coupling problem worth addressing before it becomes an incident.
Prototype the same pipeline in a framework you haven't used. Not to switch - to understand where your current framework's abstraction level has been making decisions for you that you didn't know were being made.

Cognitive load is a form of technical debt. Unlike code complexity, it doesn't show up in linters or coverage reports. It shows up when the person who understands this system is on vacation.

Engineers of AI

Read more: www.engineersofai.com

If this was useful, forward it to one engineer who should be reading it.