AI Letters #12 - How Much Code Does RAG Actually Take? We Measured.
Lines of code is a proxy for cognitive load - how much API surface you need to hold in your head before you can get something working.
You've read the benchmarks. You've seen the "getting started" examples. Every framework looks reasonable in a five-line README snippet. You don't find out what you actually signed up for until you're building something real - wiring together five objects to do one thing, reading three docs pages to understand why the chain API works the way it does, debugging a provider mismatch at 11pm the night before a demo.
The real question when evaluating an LLM framework isn't capability. They all support RAG, tool calling, and retrieval. It's: how much of this framework do I need to understand before my first pipeline runs?
I built the same RAG pipeline in SynapseKit, LangChain, and LlamaIndex. Same task, same document, same query. Then counted the lines.
Disclosure: I'm the author of SynapseKit. All benchmarks are reproducible - the Kaggle notebook is public: LLM Showdown #3 - Hello RAG: LoC.
The Task
Standard RAG pipeline. Nothing exotic - the kind of thing you'd build on day one:
- Load a text document
- Chunk it
- Embed it into a vector store
- Retrieve the top-5 relevant chunks for a query
- Generate an answer with an LLM
Same document. Same query. Minimal idiomatic implementation in each framework - no shortcuts, no padding.
The Numbers
| Framework | Import lines | Functional lines | Total |
|---|---|---|---|
| SynapseKit | 1 | 3 | 4 |
| LlamaIndex | 3 | 6 | 9 |
| LangChain | 5 | 8 | 13 |
Before you read what each number means: read all three pipelines first. The difference in cognitive load is immediate.
The Code, Side by Side
SynapseKit - 4 lines total
────────────────────────────────────────────────
from synapsekit import RAG
rag = RAG(model="llama-3.1-8b-instant", api_key=KEY, provider="groq")
rag.add(DOCUMENT)
answer = rag.ask_sync(QUERY)
────────────────────────────────────────────────
LlamaIndex - 9 lines total
────────────────────────────────────────────────
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
Settings.llm = OpenAI(model="gpt-4o-mini", api_key=KEY)
Settings.embed_model = OpenAIEmbedding(api_key=KEY)
index = VectorStoreIndex.from_documents([Document(text=DOCUMENT)])
engine = index.as_query_engine(similarity_top_k=5)
answer = engine.query(QUERY)
────────────────────────────────────────────────
LangChain - 13 lines total
────────────────────────────────────────────────
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
docs = [Document(page_content=DOCUMENT)]
chunks = RecursiveCharacterTextSplitter(
chunk_size=500, chunk_overlap=50
).split_documents(docs)
vs = FAISS.from_documents(chunks, OpenAIEmbeddings(api_key=KEY))
chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4o-mini", api_key=KEY),
retriever=vs.as_retriever(search_kwargs={"k": 5})
)
answer = chain.invoke({"query": QUERY})
────────────────────────────────────────────────
What Each Number Actually Means
SynapseKit at 4 lines is a facade. RAG is a single object that internalises the entire pipeline - chunking strategy, embedding model, vector store, retrieval, generation. You configure it with one constructor and call two methods. The cognitive model you need: one class, two methods.
The cost: those internals aren't visible unless you dig into the source. If the defaults don't match your requirements, you either go deeper into the API or you switch frameworks.
LlamaIndex at 9 lines makes the pipeline explicit while keeping it clean. Settings is a global config object - you set LLM and embeddings once, and every subsequent LlamaIndex operation inherits them. Then you index your documents and create a query engine. The cognitive model: global config + index + engine. Three concepts.
The pattern is opinionated but learnable. The Settings global is controversial - it makes multi-model workflows awkward - but for single-model RAG it's ergonomic.
LangChain at 13 lines exposes every step as a first-class object. You construct a splitter, an embeddings model, a vector store, and a chain - then wire them together explicitly. The cognitive model: five separate components, each with its own constructor, each with its own configuration surface.
This is maximum control. You can swap any component independently. You can insert custom logic between any two steps. You pay for that flexibility in lines you have to write and concepts you have to understand before anything runs.
The Hidden Test: Switching Providers
LoC at the happy path is one number. The real test is the diff when requirements change. Here's what happens when you swap OpenAI for Groq - a common move in cost-optimised pipelines:
Provider switch: OpenAI → Groq
────────────────────────────────────────────────────────
SynapseKit: 1 line changed
Before: RAG(model="gpt-4o-mini", provider="openai")
After: RAG(model="llama-3.1-8b-instant", provider="groq")
LangChain: 3 lines changed
Before: ChatOpenAI(model="gpt-4o-mini", api_key=KEY)
After: # pip install langchain-groq
from langchain_groq import ChatGroq
ChatGroq(model="llama-3.1-8b-instant", api_key=KEY)
LlamaIndex: 3 lines changed
Before: Settings.llm = OpenAI(model="gpt-4o-mini", api_key=KEY)
After: # pip install llama-index-llms-groq
from llama_index.llms.groq import Groq
Settings.llm = Groq(model="llama-3.1-8b-instant", api_key=KEY)
────────────────────────────────────────────────────────
SynapseKit changes one string. LangChain and LlamaIndex both require a new package install and a new import. This isn't a knock on their design - per-provider packages mean you get the latest provider SDK features. But it also means every provider switch has a non-trivial friction cost: a pip install, a new import, a new class name to remember.
The Abstraction Pyramid
Abstraction level vs control surface
HIGH ABSTRACTION
┌─────────────────────────────────────┐
│ SynapseKit │
│ RAG() │
│ 1 class, 2 methods │
│ Defaults hide everything │
│ Provider = 1 string param │
└─────────────────────────────────────┘
↕ 5 lines
┌─────────────────────────────────────┐
│ LlamaIndex │
│ Settings + Index + Engine │
│ 3 concepts, global config │
│ Provider = new pip + new class │
└─────────────────────────────────────┘
↕ 4 lines
┌─────────────────────────────────────┐
│ LangChain │
│ Splitter + Embeddings + Store │
│ + Chain + Retriever │
│ 5 components, explicit wiring │
│ Provider = new pip + new class │
└─────────────────────────────────────┘
LOW ABSTRACTION / HIGH CONTROL
The pyramid isn't a ranking. It's a choice. Where you sit on it should match your use case.
What This Means for Engineers
-
LoC at hello-world predicts LoC at production. The gap between frameworks doesn't close as pipelines get more complex - it widens. LangChain's explicit-component model compounds: more features means more objects to wire. SynapseKit's facade model compounds differently: more features may hit the defaults ceiling earlier.
-
The right framework depends on which direction you'll deviate from the defaults. If you need custom chunking, custom retrieval logic, or non-standard pipeline ordering - LangChain's component model is correct. If you're building a standard pipeline and want to move fast - SynapseKit or LlamaIndex are better bets.
-
The provider-per-package pattern has a real operational cost. Both LangChain and LlamaIndex have moved to per-provider packages (
langchain-openai,llama-index-llms-groq). This is the right call for maintainability. The cost: your dependency tree grows with every provider you support. -
LlamaIndex's
Settingsglobal is a footgun in multi-model pipelines. If you're building a system where different agents use different LLMs, global config creates subtle coupling bugs. LangChain's explicit wiring is safer here. SynapseKit's per-instance config is cleanest. -
The real question isn't which framework is "best" - it's which mental model your team can maintain at 2am during an incident. Fewer concepts means fewer places for confusion to hide. This matters more than benchmarks when something breaks in production.
The Thing Most People Miss
The abstraction tradeoff isn't just about now - it's about six months from now when a new engineer joins the team and has to debug a retrieval regression.
With LangChain, the pipeline is its own documentation. Every component is named, every connection is explicit. A new engineer can read the code and understand what's happening without knowing LangChain. With SynapseKit, they need to read the RAG class internals to understand why results changed after a version bump.
Neither is wrong. But "readable to someone unfamiliar with this framework" is a real production criterion. Explicit is often more maintainable than concise - which is exactly why Python chose it as a core principle.
Three Things Worth Doing This Week
-
Count the LoC for your current RAG pipeline - just the core read/chunk/embed/retrieve/generate path, no scaffolding. If it's over 30 lines, identify which lines are framework boilerplate vs actual business logic. That ratio tells you how much of your codebase is fighting the framework.
-
Run the provider switch test on your stack. Change your LLM provider in your existing pipeline. Count the number of files and lines you had to touch. If the answer is more than 2, you have a coupling problem worth addressing before it becomes an incident.
-
Prototype the same pipeline in a framework you haven't used. Not to switch - to understand where your current framework's abstraction level has been making decisions for you that you didn't know were being made.
Cognitive load is a form of technical debt. Unlike code complexity, it doesn't show up in linters or coverage reports. It shows up when the person who understands this system is on vacation.
Engineers of AI
Read more: www.engineersofai.com
If this was useful, forward it to one engineer who should be reading it.
Interactive Data Stories
Explore all AI Letters visuals →
First published on AI Letters →
