RAG System¶
Import: from selectools.rag import RAGAgent, DocumentLoader, VectorStore, TextSplitter Stability: stable Since: v0.14.0
from selectools import Agent, AgentConfig, tool
from selectools.providers.stubs import LocalProvider
from selectools.rag import DocumentLoader, TextSplitter
# Load and chunk documents
docs = DocumentLoader.from_text(
"Selectools supports OpenAI, Anthropic, Gemini, and Ollama providers. "
"It provides RAG, tool calling, guardrails, and multi-agent orchestration.",
metadata={"source": "overview.txt"},
)
splitter = TextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
print(f"Loaded {len(chunks)} chunks from {len(docs)} documents")
# In production, embed chunks into a VectorStore and use RAGAgent:
# store = VectorStore.create("memory", embedder=embedder)
# store.add_documents(chunks)
# agent = RAGAgent.from_documents(docs, provider=provider, vector_store=store)
graph LR
D[Documents] --> C[Chunker]
C --> E[Embedder]
E --> V[Vector Store]
Q[Query] --> H[Hybrid Search]
V --> H
H --> RR[Reranker]
RR --> A[Agent] See Also
- Embeddings -- OpenAI, Anthropic, Gemini, Cohere embedding providers
- Vector Stores -- 7 backends: Memory, SQLite, Chroma, Pinecone, FAISS, Qdrant, pgvector
- Advanced Chunking -- semantic and contextual chunking
- Hybrid Search -- BM25 + vector fusion with reranking
Directory: src/selectools/rag/ Files: __init__.py, vector_store.py, loaders.py, chunking.py, tools.py
Table of Contents¶
- Overview
- RAG Pipeline
- Document Loading
- Text Chunking
- Vector Storage
- RAG Tools
- RAGAgent High-Level API
- Cost Tracking
Overview¶
The RAG (Retrieval-Augmented Generation) system enables agents to answer questions about your documents by:
- Loading documents from various sources
- Chunking them into manageable pieces
- Generating vector embeddings
- Storing in a vector database
- Retrieving relevant chunks during queries
- Providing context to the LLM
Key Components¶
RAG Pipeline¶
Complete Flow Diagram¶
graph TD
A["Stage 1: Ingestion\nDocumentLoader\nfrom_file / from_directory / from_pdf"] --> B["Stage 2: Chunking\nTextSplitter / RecursiveTextSplitter\nchunk_size, chunk_overlap"]
B --> C["Stage 3: Embedding\nEmbeddingProvider\nOpenAI / Anthropic / Gemini"]
C --> D["Stage 4: Storage\nVectorStore\nMemory / SQLite / Chroma"]
Q["User Question"] --> E["Stage 5: Query & Retrieval\nembed_query() + VectorStore.search()\ncosine similarity, top_k"]
D --> E
E --> F["Stage 6: Generation\nRAGTool formats results with sources\nLLM generates answer with citations"] Document Loading¶
DocumentLoader Class¶
from selectools.rag import DocumentLoader
# From text
docs = DocumentLoader.from_text("Hello world", metadata={"source": "memory"})
# From file
docs = DocumentLoader.from_file("document.txt")
# From directory
docs = DocumentLoader.from_directory(
directory="./docs",
glob_pattern="**/*.md",
recursive=True
)
# From PDF
docs = DocumentLoader.from_pdf("manual.pdf")
Loading from CSV (v0.21.0)¶
from selectools.rag import DocumentLoader
# One document per row; text_column selects the content field
docs = DocumentLoader.from_csv(
"data.csv",
text_column="content",
metadata_columns=["author", "category"],
delimiter=",",
)
# When text_column is None, all columns are concatenated as "key: value" pairs
Loading from JSON (v0.21.0)¶
# Handles JSON arrays (one Document per item) or a single object
docs = DocumentLoader.from_json(
"articles.json",
text_field="body", # Key whose value becomes the text
metadata_fields=["title", "author"], # Keys for metadata (None = all)
)
Loading from HTML (v0.21.0)¶
# Full page
docs = DocumentLoader.from_html("page.html")
# With CSS selector (requires beautifulsoup4)
docs = DocumentLoader.from_html("page.html", selector="article")
Loading from URL (v0.21.0)¶
# Fetch a web page and extract text content
docs = DocumentLoader.from_url(
"https://example.com/article",
selector="main", # Optional CSS selector (requires beautifulsoup4)
timeout=30.0,
)
Document Structure¶
@dataclass
class Document:
text: str # Document content
metadata: Dict[str, Any] # Source, page, etc.
embedding: Optional[List[float]] = None # Pre-computed embedding
Metadata¶
Automatically added:
source: File pathfilename: File name onlypage: Page number (PDFs)total_pages: Total pages (PDFs)
Text Chunking¶
Why Chunk?¶
Large documents must be split because:
- Embedding models have token limits
- Retrieving entire documents is inefficient
- Smaller chunks improve retrieval precision
TextSplitter¶
from selectools.rag import TextSplitter
splitter = TextSplitter(
chunk_size=1000, # Max characters per chunk
chunk_overlap=200, # Overlap for context continuity
separator="\n\n" # Prefer splitting on paragraphs
)
chunks = splitter.split_text(long_text)
chunked_docs = splitter.split_documents(documents)
RecursiveTextSplitter¶
More intelligent splitting that respects natural boundaries:
from selectools.rag import RecursiveTextSplitter
splitter = RecursiveTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", ". ", " ", ""] # Try in order
)
# Tries to split on:
# 1. Double newlines (paragraphs) - preferred
# 2. Single newlines (lines)
# 3. Sentences (". ")
# 4. Words (" ")
# 5. Characters - last resort
Chunk Metadata¶
{
"source": "docs/guide.md",
"filename": "guide.md",
"chunk": 0, # Chunk index
"total_chunks": 5 # Total chunks from this doc
}
Advanced Chunking¶
For semantic (topic-boundary) splitting and LLM-context enrichment, see Advanced Chunking.
Vector Storage¶
VectorStore Factory¶
from selectools.rag import VectorStore
from selectools.embeddings import OpenAIEmbeddingProvider
embedder = OpenAIEmbeddingProvider()
# In-memory (fast, not persistent)
store = VectorStore.create("memory", embedder=embedder)
# SQLite (persistent, local)
store = VectorStore.create("sqlite", embedder=embedder, db_path="docs.db")
# Chroma (advanced features)
store = VectorStore.create("chroma", embedder=embedder, persist_directory="./chroma")
# Pinecone (cloud-hosted, scalable)
store = VectorStore.create("pinecone", embedder=embedder, index_name="my-index")
FAISS (v0.21.0)¶
Stability: beta
Fast local similarity search using Facebook AI Similarity Search. Uses IndexFlatIP with L2-normalized vectors for exact cosine similarity. Thread-safe with persistence support.
from selectools.embeddings import OpenAIEmbeddingProvider
from selectools.rag.stores import FAISSVectorStore
embedder = OpenAIEmbeddingProvider()
store = FAISSVectorStore(embedder, dimension=1536)
# Add documents
docs = [Document(text="Hello world", metadata={"source": "test"})]
ids = store.add_documents(docs)
# Search
query_emb = embedder.embed_query("hi")
results = store.search(query_emb, top_k=3)
# Persist to disk and reload
store.save("/tmp/my_index")
loaded = FAISSVectorStore.load("/tmp/my_index", embedder)
Qdrant (v0.21.0)¶
Stability: beta
Production vector search with gRPC transport, automatic collection management, and advanced metadata filtering. Supports both self-hosted and Qdrant Cloud.
from selectools.embeddings import OpenAIEmbeddingProvider
from selectools.rag.stores import QdrantVectorStore
embedder = OpenAIEmbeddingProvider()
store = QdrantVectorStore(
embedder,
collection_name="my_docs",
url="http://localhost:6333", # Qdrant server URL
api_key="...", # Optional: for Qdrant Cloud
prefer_grpc=True, # Default: use gRPC transport
)
# Collection is auto-created on first add_documents() call
docs = [Document(text="Hello world", metadata={"source": "test"})]
ids = store.add_documents(docs)
# Search with metadata filtering
query_emb = embedder.embed_query("hi")
results = store.search(query_emb, top_k=3, filter={"source": "test"})
pgvector (v0.21.0)¶
Stability: beta
PostgreSQL-native vector search using the pgvector extension with HNSW indexing for fast approximate nearest-neighbour queries. Automatic table and index creation, JSONB metadata, and parameterized queries throughout.
Requires a PostgreSQL server with the vector extension installed.
from selectools.embeddings import OpenAIEmbeddingProvider
from selectools.rag.stores.pgvector import PgVectorStore
embedder = OpenAIEmbeddingProvider()
store = PgVectorStore(
embedder=embedder,
connection_string="postgresql://user:pass@localhost:5432/mydb",
table_name="selectools_documents", # Optional custom table name
)
# Table and HNSW index are created automatically on first use
docs = [Document(text="Hello world", metadata={"source": "test"})]
ids = store.add_documents(docs)
# Search with JSONB metadata filtering
query_emb = embedder.embed_query("hi")
results = store.search(query_emb, top_k=3, filter={"source": "test"})
Vector Store Comparison¶
| Backend | Install | Persistent | Scalable | Metadata Filter | Best For |
|---|---|---|---|---|---|
| Memory | built-in | No | No | Yes | Prototyping, tests |
| SQLite | built-in | Yes | No | Yes | Local apps |
| Chroma | chromadb | Yes | No | Yes | Local + advanced |
| Pinecone | pinecone-client | Yes | Yes | Yes | Cloud scale |
| FAISS | faiss-cpu | Yes (save/load) | No | Yes | Fast local search |
| Qdrant | qdrant-client | Yes | Yes | Yes (advanced) | Production self-hosted/cloud |
| pgvector | psycopg2-binary | Yes | Yes | Yes (JSONB) | PostgreSQL-native apps |
Interface¶
class VectorStore(ABC):
@abstractmethod
def add_documents(
self,
documents: List[Document],
embeddings: Optional[List[List[float]]] = None
) -> List[str]:
"""Add documents, return IDs."""
pass
@abstractmethod
def search(
self,
query_embedding: List[float],
top_k: int = 5,
filter: Optional[Dict[str, Any]] = None
) -> List[SearchResult]:
"""Search for similar documents."""
pass
@abstractmethod
def delete(self, ids: List[str]) -> None:
"""Delete documents by ID."""
pass
@abstractmethod
def clear(self) -> None:
"""Clear all documents."""
pass
Usage¶
# Add documents
ids = store.add_documents(chunked_docs)
# Embeddings are generated automatically
# Search
query_embedding = embedder.embed_query("What are the features?")
results = store.search(query_embedding, top_k=3)
for result in results:
print(f"Score: {result.score}")
print(f"Text: {result.document.text}")
print(f"Source: {result.document.metadata['source']}")
RAG Tools¶
RAGTool¶
Pre-built tool for knowledge base search:
from selectools.rag import RAGTool
rag_tool = RAGTool(
vector_store=store,
top_k=3, # Retrieve top 3 chunks
score_threshold=0.5, # Minimum similarity
include_scores=True # Show relevance scores
)
# Use with agent
from selectools import Agent
agent = Agent(
tools=[rag_tool.search_knowledge_base],
provider=provider
)
response = agent.run([
Message(role=Role.USER, content="What are the installation steps?")
])
Tool Output Format¶
[Source 1: README.md, Relevance: 0.89]
Installation is simple:
1. pip install selectools
2. Set OPENAI_API_KEY
3. Create an agent
[Source 2: docs/quickstart.md (page 1), Relevance: 0.82]
Quick start guide:
First, install the package...
[Source 3: docs/setup.md, Relevance: 0.75]
Setup instructions for production...
The LLM uses this context to generate an accurate answer.
RAGAgent High-Level API¶
Three Convenient Methods¶
from selectools.rag import RAGAgent
# 1. From documents
docs = DocumentLoader.from_file("doc.txt")
agent = RAGAgent.from_documents(
documents=docs,
provider=OpenAIProvider(),
vector_store=store
)
# 2. From directory (most common)
agent = RAGAgent.from_directory(
directory="./docs",
provider=OpenAIProvider(),
vector_store=store,
glob_pattern="**/*.md",
chunk_size=1000,
top_k=3
)
# 3. From specific files
agent = RAGAgent.from_files(
file_paths=["doc1.txt", "doc2.pdf"],
provider=OpenAIProvider(),
vector_store=store
)
Behind the Scenes¶
RAGAgent automatically:
- Loads documents
- Chunks them
- Generates embeddings
- Stores in vector database
- Creates RAGTool
- Returns configured Agent
Usage¶
# Ask questions
response = agent.run("What are the main features?")
print(response.content)
# Check costs (includes embeddings)
print(agent.get_usage_summary())
# Continue conversation
response = agent.run("Tell me more about feature X")
Cost Tracking¶
RAG Costs¶
RAG operations incur two types of costs:
- Embedding Costs: Generating vectors from text
- LLM Costs: Generating responses
Tracked Automatically¶
agent = RAGAgent.from_directory("./docs", provider, store)
response = agent.run("What are the features?")
print(agent.usage)
Output¶
============================================================
📊 Usage Summary
============================================================
Total Tokens: 5,432
- Prompt: 3,210
- Completion: 1,200
- Embeddings: 1,022
Total Cost: $0.002150
- LLM: $0.002000
- Embeddings: $0.000150
============================================================
Cost Breakdown¶
# Embedding cost (one-time, during indexing)
embedding_cost = (num_chunks * avg_chunk_tokens / 1M) * embedding_model_cost
# Per-query cost
query_cost = (
(query_tokens / 1M) * embedding_model_cost + # Query embedding
(prompt_tokens / 1M) * llm_prompt_cost + # LLM prompt
(completion_tokens / 1M) * llm_completion_cost # LLM completion
)
Best Practices¶
1. Choose Appropriate Chunk Size¶
# Short, focused documents
chunk_size=500
# Standard documents
chunk_size=1000
# Technical documentation
chunk_size=1500
2. Use Overlap for Context¶
# Recommended overlap: 10-20% of chunk_size
splitter = TextSplitter(
chunk_size=1000,
chunk_overlap=200 # 20%
)
3. Set Reasonable top_k¶
4. Use Score Thresholds¶
rag_tool = RAGTool(
vector_store=store,
top_k=3,
score_threshold=0.7 # Filter low-relevance results
)
5. Choose Right Vector Store¶
# Prototyping
store = VectorStore.create("memory", embedder)
# Production (local, fast)
store = FAISSVectorStore(embedder, dimension=1536)
# Production (local, SQL)
store = VectorStore.create("sqlite", embedder, db_path="prod.db")
# Production (PostgreSQL-native)
store = PgVectorStore(embedder, connection_string="postgresql://...")
# Production (managed, scalable)
store = QdrantVectorStore(embedder, url="http://qdrant:6333")
store = VectorStore.create("pinecone", embedder, index_name="prod")
6. Use Free Embeddings¶
from selectools.embeddings import GeminiEmbeddingProvider
# Gemini embeddings are FREE
embedder = GeminiEmbeddingProvider()
store = VectorStore.create("sqlite", embedder=embedder)
Complete Example¶
from selectools import OpenAIProvider, Message, Role
from selectools.embeddings import OpenAIEmbeddingProvider
from selectools.rag import RAGAgent, VectorStore
from selectools.models import OpenAI
# 1. Set up embedding provider
embedder = OpenAIEmbeddingProvider(
model=OpenAI.Embeddings.TEXT_EMBEDDING_3_SMALL.id
)
# 2. Create vector store
store = VectorStore.create("sqlite", embedder=embedder, db_path="knowledge.db")
# 3. Create RAG agent from documents
agent = RAGAgent.from_directory(
directory="./docs",
glob_pattern="**/*.md",
provider=OpenAIProvider(default_model=OpenAI.GPT_4O_MINI.id),
vector_store=store,
chunk_size=1000,
chunk_overlap=200,
top_k=3,
score_threshold=0.5
)
# 4. Ask questions
questions = [
"What are the installation steps?",
"How do I create an agent?",
"What providers are supported?"
]
for question in questions:
print(f"\nQ: {question}")
response = agent.run([Message(role=Role.USER, content=question)])
print(f"A: {response.content}\n")
# 5. Check costs
print("=" * 60)
print(agent.get_usage_summary())
Troubleshooting¶
No Results Found¶
# Issue: score_threshold too high
rag_tool = RAGTool(score_threshold=0.9) # Too strict
# Fix: Lower threshold
rag_tool = RAGTool(score_threshold=0.5)
Irrelevant Results¶
# Issue: chunk_size too large
splitter = TextSplitter(chunk_size=5000) # Too big
# Fix: Smaller chunks
splitter = TextSplitter(chunk_size=1000)
High Costs¶
# Issue: Expensive embedding model
embedder = OpenAIEmbeddingProvider(model="text-embedding-3-large")
# Fix: Use cheaper or free model
embedder = GeminiEmbeddingProvider() # FREE
Related Examples¶
| # | Script | Description |
|---|---|---|
| 14 | 14_rag_basic.py | Basic RAG pipeline with document loading |
| 15 | 15_semantic_search.py | Semantic search over embedded documents |
| 16 | 16_rag_advanced.py | Advanced RAG with chunking and score thresholds |
| 18 | 18_hybrid_search.py | BM25 + vector hybrid search with reranking |
| 19 | 19_advanced_chunking.py | Semantic and contextual chunking strategies |
Further Reading¶
- Advanced Chunking - SemanticChunker and ContextualChunker
- Embeddings Module - Embedding providers
- Vector Stores Module - Storage implementations
- Usage Module - Cost tracking
Next Steps: Understand embedding providers in the Embeddings Module.