Digital illustration of a modular Retrieval-Augmented Generation architecture with separated ingestion and generation layers.
|

RAG Movie Plots: Designing a Modular RAG System

Retrieval-Augmented Generation (RAG) is often introduced as a linear workflow:

  1. Load documents
  2. Split text
  3. Generate embeddings
  4. Store vectors
  5. Retrieve context
  6. Ask an LLM

At a conceptual level, this description is correct. 

At an architectural level, it hides important structural decisions.

The RAG Movie Plots project was built to explore these design decisions in practice, using a modular structure to make each step explicit and testable.

Why Build This Project?

Previous posts examined specific structural behaviors in isolation. Those analyses were useful, but limited in scope.

A complete RAG system connects multiple layers:

  • Data normalization
  • Text segmentation
  • Vector representation
  • Similarity search
  • Prompt construction
  • Language model invocation

Once these components are connected, early decisions propagate downstream.

This project exists to make those dependencies explicit and understand how the pipeline behaves as a system.

Project Scope & Current Implementation

RAG Movie Plots is a modular RAG system built on top of the Wikipedia Movie Plots dataset.

At its current stage, the project implements:

  • An offline ingestion workflow that produces structured artifacts
  • A persisted vector store for semantic retrieval
  • A configurable retriever with optional similarity filtering
  • A versioned prompt template designed for grounded generation
  • A runtime orchestration layer that connects retrieval and generation

Scope limitations

The system intentionally does not yet include:

  • An automated evaluation pipeline
  • A production deployment configuration
  • Benchmark or performance validation

Those components will be explored in future iterations. For now, the focus is on making the structure clear and the architectural decisions traceable.

Why the Wikipedia Movie Plots Dataset?

The dataset was chosen for structural reasons.

Exploratory analysis shows that:

  • Most plots are long in terms of character count
  • The majority of entries appear as a single paragraph
  • Line breaks are rare and inconsistently used
  • Many plots contain very long uninterrupted lines

These characteristics make structural decisions — such as chunking strategy and boundary selection — observable and testable.

High-Level Architecture

The system is intentionally divided into two independent phases:

  • Offline Ingestion
  • Online Retrieval & Generation

This separation ensures that preprocessing and storage decisions can be studied independently from runtime query behavior.

Figure 1 — High-level architecture of the RAG Movie Plots project, showing the separation between offline ingestion and online querying.

Phase 1 — Offline Ingestion

The ingestion phase transforms raw tabular data into a searchable semantic representation.

At a high level, it includes:

  • Cleaning and normalization of raw records
  • Conversion into structured JSONL documents
  • Data-driven chunking of narrative text
  • Embedding generation
  • Persistence into a local vector database

Artifacts such as docs.jsonl and chunks.jsonl are explicitly produced and stored.

No generation occurs at this stage. This phase prepares the knowledge base.

Phase 2 — Online Retrieval & Generation

The online phase operates exclusively on the persisted vector store.

Given a user question, the system:

  • Encodes the query
  • Retrieves semantically similar chunks
  • Assembles contextual input
  • Injects context into a structured prompt
  • Generates an answer constrained by that context

This phase is about making retrieval behavior visible and prompt constraints explicit.

It doesn’t try to optimize or rerank yet.

Instead, it establishes a clean baseline — something we can inspect, question, and experiment with in a controlled way.

Modular Structure

The project is organized into five high-level modules:

  1. ETL – Data Cleaning & JSONL Generation
  2. Chunking – Text Segmentation
  3. Embedding & Vector Persistence
  4. Retrieval – Semantic Search & Filtering
  5. Generation – Prompt & Answer Synthesis

Each module is isolated by design. They communicate through artifacts rather than shared internal state.

That separation isn’t accidental — it’s the core architectural principle behind the project.

Future posts in this series will examine each module independently.

Design Principles

The system follows a few guiding ideas:

  • Separate ingestion from querying
  • Preserve intermediate artifacts
  • Treat chunking as a design decision, not a default
  • Keep retrieval explicit and configurable
  • Constrain generation through structured prompting

The objective is traceability across layers. Each module will be analyzed before being generalized.

What This Series Will Cover

This post introduces the architecture. Upcoming posts will explore:

  • Dataset structure and cleaning decisions
  • Chunking design informed by exploratory analysis
  • Embedding and persistence strategy
  • Retrieval configuration and behavior
  • Prompt constraints and generation mechanics

Each layer will be examined on its own before drawing system-level conclusions.

Repository

The full implementation of the RAG Movie Plots project is available on GitHub: RAG Movie Plots — Modular RAG Architecture.

The repository includes:

  • Ingestion pipelines
  • Chunking strategies
  • Vector persistence
  • Retrieval configuration
  • Prompt templates
  • Query notebooks

Final Note

RAG systems are often described as pipelines.

This project treats them as layered systems whose structural decisions accumulate over time.

Understanding those layers is more valuable than tuning them blindly.

RAG Movie Plots is an attempt to make those decisions explicit — and therefore understandable. It is an environment for investigation.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *