AI15 min readUpdated 2026-06-29

Build a RAG Pipeline in n8n (End-to-End)

Production RAG in n8n: ingestion, chunking, embeddings, vector store, retrieval, and evaluation. Copy-paste patterns.

Key takeaways

  • Chunk by structure (headings) not by character count when possible.
  • Use 1024-dim embeddings (text-embedding-3-small) for the cost/quality sweet spot.
  • Always store source URL and updated_at with every chunk.
  • Re-embed on document change, not on a fixed schedule.

Retrieval-Augmented Generation lets an LLM answer questions over your private data. n8n makes the whole pipeline visual — ingestion, chunking, embedding, retrieval, and answer generation are all native nodes. This guide is the canonical end-to-end pattern.

Pipeline overview

Two workflows. Ingestion: source → fetch → chunk → embed → upsert to vector store. Query: question → embed → retrieve top-K → rerank → LLM with context → answer with citations.

Ingestion workflow

Trigger: webhook from your CMS on publish, or schedule a nightly crawl. Chunk: Recursive Character Text Splitter or, better, a Code node that splits by markdown headings so chunks respect semantic boundaries. Embed: OpenAI Embeddings node with text-embedding-3-small. Upsert: Vector Store node with metadata { source, title, updated_at, chunk_index }.

Query workflow

Trigger: Chat Trigger or webhook. Embed the question. Retrieve top 20. Rerank to top 5 with Cohere Rerank or an LLM-based reranker. Pass to LLM with system prompt forcing inline citations. Return answer + sources.

Evaluation

Build a Postgres table of 50 question/expected-source pairs. Nightly workflow: for each row, ask the system, check that the cited source matches. Track answer relevance with an LLM-as-judge. This is the single most under-used technique in production RAG.

Common pitfalls

Chunking on character count splits sentences and tanks retrieval. Forgetting metadata makes you unable to filter by recency or source. Skipping reranking caps your quality. Not evaluating means you can't tell when you broke things.

Frequently asked questions

Which vector store should I use with n8n?
Supabase pgvector if you already have Postgres; Pinecone for managed scale; Qdrant for self-hosted with hybrid search.
Do I need a reranker?
For any RAG over more than ~100 documents, yes. The quality jump usually outweighs the latency cost.
HomePathTemplatesBlogMy