Selectools Development Roadmap¶

Status Legend

✅ Implemented - Merged and available in latest release

🔵 In Progress - Actively being worked on

🟡 Planned - Scheduled for implementation

⏸️ Deferred - Postponed to later release

❌ Cancelled - No longer planned

v0.16.0: Memory & Persistence¶

Focus: Durable conversation state, cross-session knowledge, and advanced memory strategies.

Persistent Conversation Sessions¶

Problem: ConversationMemory is in-memory only. Process restarts lose all history. Chat applications need sessions that survive restarts.

What it does: SessionStore protocol with pluggable backends. Sessions auto-persist after each turn with TTL-based expiry.

API:

from selectools.memory import JsonFileSessionStore

store = JsonFileSessionStore(directory="./sessions")
agent = Agent(
    tools=[...], provider=provider,
    config=AgentConfig(session_store=store, session_id="user-123"),
)
result = agent.ask("What was my last question?")  # auto-persisted

Scope:

SessionStore protocol: save(), load(), list(), delete()
Three backends: JSON file, SQLite, Redis
Auto-save after each turn
TTL-based expiry
Tool-pair-preserving trim on load

Touches: New sessions.py, AgentConfig, agent/core.py.

Summarize-on-Trim¶

Problem: Old messages are silently dropped when history exceeds limits. Important early context is lost.

What it does: Before trimming, summarize the messages being removed and inject the summary as a system-level context message.

API:

memory = ConversationMemory(
    max_messages=30,
    summarize_on_trim=True,
    summarize_provider=provider,
)

Scope:

LLM-generated 2-3 sentence summary of trimmed messages
Summary injected as system message at conversation start
Configurable summary model (use a cheap model like Haiku)

Touches: memory.py, provider integration.

Entity Memory¶

Problem: The agent can't track entities (people, orgs, projects) mentioned across turns. Each turn starts with no entity context.

What it does: Automatically extract named entities from conversation, maintain an entity registry, and inject relevant entity context into prompts.

API:

from selectools.memory import EntityMemory

memory = EntityMemory(provider=provider)
agent = Agent(tools=[...], provider=provider, memory=memory)

agent.ask("I'm working with Alice from Acme Corp on Project Alpha")
agent.ask("What project am I working on?")
# Agent knows: Alice (person, Acme Corp), Acme Corp (org), Project Alpha (project)

Scope:

LLM-based entity extraction after each turn
Entity types: person, organization, project, location, date, custom
Entity registry: name → type, attributes, last mentioned
System prompt injection of relevant entities
Configurable: extraction model, max entities, relevance window

Touches: New entity_memory.py, PromptBuilder integration.

Knowledge Graph Memory¶

Problem: Entity memory tracks individual entities but not relationships between them. "Alice manages Project Alpha" is lost.

What it does: Build a graph of (subject, relation, object) triples from conversations. Query the graph to inject relevant relationship context into prompts.

API:

from selectools.memory import KnowledgeGraphMemory

memory = KnowledgeGraphMemory(provider=provider, storage="sqlite")
agent = Agent(tools=[...], provider=provider, memory=memory)

agent.ask("Alice manages Project Alpha and reports to Bob")
# Graph: (Alice, manages, Project Alpha), (Alice, reports_to, Bob)

agent.ask("Who manages Project Alpha?")
# Relevant triples injected: (Alice, manages, Project Alpha)

Scope:

LLM-based triple extraction
Storage: in-memory dict (default), SQLite (persistent)
Query: retrieve triples relevant to current query via keyword + embedding match
System prompt injection of relevant triples
Graph operations: add, query, merge, prune

Touches: New knowledge_graph.py, storage backend, PromptBuilder.

Cross-Session Knowledge Memory¶

Problem: Even with persistent sessions, each session is isolated. There's no way for an agent to "remember" facts across conversations (e.g., user preferences, prior decisions).

What it does: A file-based or DB-backed knowledge memory with two layers: a daily log (append-only entries from the current day) and a long-term store (curated facts that persist indefinitely). A built-in remember tool lets the agent save facts explicitly. Relevant memories are auto-injected into the system prompt.

API:

from selectools.memory import KnowledgeMemory

knowledge = KnowledgeMemory(
    directory="./workspace",
    recent_days=2,           # inject last 2 days into system prompt
    max_context_chars=5000,  # cap memory injection size
)

agent = Agent(
    tools=[...],
    provider=provider,
    config=AgentConfig(knowledge_memory=knowledge),
)

Scope:

Daily log files (memory/YYYY-MM-DD.md) + persistent MEMORY.md
Built-in remember tool: agent can save categorized facts
System prompt auto-injection of recent and long-term memories
Configurable retention and context window

Touches: New knowledge.py module, PromptBuilder integration, built-in tool in toolbox/.

Feature	Priority	Impact	Effort
Persistent Conversation Sessions	✅ Done	High	Medium
Summarize-on-Trim	✅ Done	Medium	Small
Cross-Session Knowledge Memory	✅ Done	Medium	Medium
Entity Memory	✅ Done	High	Medium
Knowledge Graph Memory	✅ Done	High	Large

Implementation Order¶

v0.13.0  ✅ Structured Output + Safety Foundation (Complete)
         Tool-pair trimming → Structured output → Execution traces → Reasoning
         → Fallback providers → Batch → Tool policy → Human-in-the-loop

v0.14.0  ✅ AgentObserver Protocol + Production Hardening (Complete)
         AgentObserver (15 events) → LoggingObserver → OTel export
         → Model registry (145 models) → max_completion_tokens fix → 11 bug fixes

v0.14.1  ✅ Streaming & Provider Fixes (Complete)
         13 streaming bug fixes → 141 new tests → Unit tests for 6 untested modules

v0.15.0  ✅ Enterprise Reliability (Complete)
         Guardrails engine → Audit logging → Tool output screening → Coherence checking

v0.16.0  ✅ Memory & Persistence (Complete)
         Sessions → Summarize-on-trim → Knowledge memory → Entity memory → KG memory

v0.16.5  ✅ Design Patterns & Code Quality
         StepType/ModelType enums → Agent mixin split → Provider base class → Terminal actions
         → Async observers → Gemini 3.x thought signatures → Hooks deprecation → ADRs

v0.17.0  🟡 Multi-Agent Orchestration
         AgentGraph → GraphState → Checkpointing → Parallel nodes → SupervisorAgent → MCP

v0.18.0  🟡 Connector Expansion
         CSV/JSON/HTML/URL/Markdown loaders → FAISS/Qdrant/pgvector stores
         → Code/search tools → SaaS loaders → GitHub/DB toolbox

v0.19.0  🟡 Ecosystem Parity
         Observe (trace store, evaluators, export) → Serve (FastAPI, Flask, playground)
         → Templates → YAML config

v0.20.0  🟡 Polish & Community
         HTML trace viewer → Niche integrations → Tool marketplace

v0.16.5: Design Patterns & Code Quality ✅¶

Focus: Structural refactoring to prevent the class of bugs found in v0.16.0–v0.16.4 and prepare a clean foundation for v0.17.0 multi-agent orchestration.

Highlights¶

Agent decomposed into 4 mixins — core.py 3128→1448 lines (-54%)
StepType + ModelType converted to str, Enum for type safety
Terminal actions — @tool(terminal=True) and stop_condition callback
AsyncAgentObserver — async lifecycle hooks with blocking flag
Gemini 3.x thought signatures — ToolCall.thought_signature field
Hooks deprecated — wrapped via _HooksAdapter, single observer pipeline
OpenAI/Ollama share base class — Template Method pattern (-75% duplication)
163 new tests (total: 1640), 53 architecture fitness tests
6 ADRs in docs/decisions/

See CHANGELOG for full details.

v0.16.6: Gemini thought_signature crash fix ✅¶

Fixed UnicodeDecodeError when Gemini 3.x returns non-UTF-8 binary thought_signature. Replaced UTF-8 encode/decode with base64 across all 5 affected locations. 2 new regression tests (total: 1642).

v0.16.7: Cleanup ✅¶

Removed unused CLI module (cli.py + console script entry point). Completed README example table (28-38). Fixed stale counts across CONTRIBUTING.md, ARCHITECTURE.md, QUICKSTART.md, AGENT.md. Total: 1620 tests.

v0.17.0: Multi-Agent Orchestration¶

Focus: DAG-based multi-agent workflows that are simpler and more Pythonic than LangGraph.

Design Philosophy¶

LangGraph requires learning StateGraph, MessageAnnotation, Pregel channels, and a custom checkpointing API before building anything. Selectools takes the opposite approach: agents are the primitive, composition is plain Python. An AgentGraph should feel like writing normal Python with async/await, not configuring a data pipeline.

Core principles:

Agents are nodes, not functions — each node is a full Agent instance with its own tools, provider, and config, reusing all existing infrastructure (traces, observers, guardrails, policies)
Edges are just Python functions — no special ConditionalEdge class; a routing function takes the result and returns the next node name via plain if/elif/else
State is a typed dataclass — no Pydantic models for state; just a @dataclass that gets passed between nodes
Checkpointing is serialization — the state is JSON-serializable; checkpoint stores implement a 3-method protocol
HITL reuses existing patterns — the existing ToolPolicy + confirm_action pattern already handles human-in-the-loop

Module Structure¶

src/selectools/orchestration/
    __init__.py           # Public exports: AgentGraph, GraphNode, GraphState, GraphResult
    graph.py              # AgentGraph: the DAG-based orchestration engine
    node.py               # GraphNode: wraps Agent with input/output transforms
    state.py              # GraphState: typed state container with merge semantics
    checkpoint.py         # CheckpointStore protocol + InMemory, File, SQLite backends
    supervisor.py         # SupervisorAgent: meta-agent for task decomposition
    mcp.py                # MCP client/server integration

Core Abstractions¶

GraphState¶

@dataclass
class GraphState:
    messages: List[Message]                    # Accumulated messages across nodes
    data: Dict[str, Any]                       # Arbitrary key-value store for inter-node communication
    current_node: str                          # Name of the currently executing node
    history: List[Tuple[str, AgentResult]]     # Ordered list of (node_name, result) pairs
    metadata: Dict[str, Any]                   # User-attached metadata (carried through checkpoints)

Intentionally flat and JSON-serializable. No Pydantic, no custom descriptors, no annotation magic.

GraphNode¶

@dataclass
class GraphNode:
    name: str
    agent: Union[Agent, Callable[[GraphState], GraphState]]
    input_transform: Optional[Callable[[GraphState], List[Message]]] = None   # state → messages for Agent.run()
    output_transform: Optional[Callable[[AgentResult, GraphState], GraphState]] = None  # merge result into state
    max_iterations: int = 1   # How many times this node can run in a cycle

If input_transform/output_transform are not provided, sensible defaults are used (last message becomes user message; result appends to state messages).

AgentGraph¶

class AgentGraph:
    """DAG-based multi-agent orchestration engine."""

    # Constants
    END: ClassVar[str] = "__end__"

    # Node management
    def add_node(self, name: str, agent: Union[Agent, Callable], **kwargs) -> None: ...
    def add_edge(self, from_node: str, to_node: str) -> None: ...
    def add_conditional_edge(self, from_node: str, router_fn: Callable[[GraphState], str]) -> None: ...
    def add_parallel_nodes(self, name: str, node_names: List[str], merge_fn: Optional[Callable] = None) -> None: ...
    def set_entry(self, node_name: str) -> None: ...

    # Execution
    def run(self, prompt_or_state: Union[str, GraphState], checkpoint_store: Optional[CheckpointStore] = None) -> GraphResult: ...
    def arun(self, prompt_or_state: ..., checkpoint_store: ...) -> GraphResult: ...
    async def astream(self, prompt_or_state: ...) -> AsyncGenerator[GraphEvent, None]: ...

    # Validation
    def validate(self) -> List[str]: ...  # Returns list of warnings/errors

Usage example:

from selectools.orchestration import AgentGraph

graph = AgentGraph()
graph.add_node("planner", planner_agent)
graph.add_node("researcher", researcher_agent)
graph.add_node("writer", writer_agent)

graph.add_edge("planner", "researcher")
graph.add_conditional_edge("researcher", lambda state: "writer" if state.data.get("ready") else "researcher")
graph.add_edge("writer", AgentGraph.END)

graph.set_entry("planner")
result = graph.run("Write a blog post about AI agents")

GraphResult¶

@dataclass
class GraphResult:
    content: str                                       # Final output text
    state: GraphState                                  # Final state
    node_results: Dict[str, List[AgentResult]]         # Per-node results
    trace: AgentTrace                                  # Composite trace (linked via parent_run_id)
    total_usage: UsageStats                            # Aggregated cost/tokens across all nodes

How It Beats LangGraph¶

LangGraph	Selectools AgentGraph	Why better
Custom `StateGraph` with `Annotated[list, add_messages]`	Plain `GraphState` dataclass	No custom type system to learn
`conditionalEdges` with special return constants	Plain Python function returning a string	Debuggable, testable, IDE-friendly
Pregel channels for state management	`Dict[str, Any]` with merge functions	Standard Python data structures
Separate `compile()` step before execution	Validate + run in one step	No compilation phase, faster iteration
`MemorySaver` / `SqliteSaver` / `PostgresSaver`	`CheckpointStore` protocol (3 methods)	Trivial to implement custom stores
Node functions receive raw state	Nodes are full `Agent` instances	Inherit all Agent features: tools, traces, observers, guardrails
Complex interrupt/resume for human-in-the-loop	Reuse existing `confirm_action` on `AgentConfig`	Zero new concepts for HITL
Sub-graphs require `CompiledGraph` nesting	AgentGraph can be a node in another AgentGraph	Natural composition via duck typing

Checkpointing¶

class CheckpointStore(Protocol):
    def save(self, graph_id: str, state: GraphState, step: int) -> str: ...    # Returns checkpoint_id
    def load(self, checkpoint_id: str) -> Tuple[GraphState, int]: ...          # Returns (state, step)
    def list(self, graph_id: str) -> List[str]: ...                            # List checkpoint_ids

Built-in implementations:

InMemoryCheckpointStore — dict-based, for development
FileCheckpointStore — JSON files, for single-process production
SQLiteCheckpointStore — for multi-process production

Enables: resume after crash, HITL pause/resume, time travel debugging.

Parallel Execution¶

graph.add_parallel_nodes("research_step", ["researcher_a", "researcher_b", "researcher_c"])
graph.add_edge("research_step", "synthesizer")

Uses asyncio.gather() (async) or ThreadPoolExecutor (sync) — same pattern already in agent/core.py for parallel tool execution. Configurable merge_fn(List[GraphState]) -> GraphState (default: concatenate messages, shallow-merge data dicts).

Sub-Graphs and Composition¶

An AgentGraph satisfies the Agent-like interface, so it can be a node in another graph:

research_subgraph = AgentGraph()
# ... define nodes ...

main_graph = AgentGraph()
main_graph.add_node("research", research_subgraph)   # Sub-graph as a node
main_graph.add_node("writer", writer_agent)
main_graph.add_edge("research", "writer")

Sub-graph traces are linked via parent_run_id (already supported in trace.py).

SupervisorAgent¶

Higher-level abstraction for common multi-agent patterns:

from selectools.orchestration import SupervisorAgent

supervisor = SupervisorAgent(
    agents={"researcher": researcher, "writer": writer, "reviewer": reviewer},
    provider=OpenAIProvider(),
    strategy="plan_and_execute",   # or "round_robin", "dynamic"
)
result = supervisor.run("Write a comprehensive blog post about AI safety")

The supervisor uses the provider LLM to decompose tasks and route to specialist agents. Internally builds and executes an AgentGraph.

MCP Integration¶

MCP Client — discover and call tools from MCP-compliant servers:

from selectools.orchestration.mcp import MCPClient, mcp_tools

client = MCPClient(server_url="http://localhost:8080")
tools = mcp_tools(client)   # Returns List[Tool] that proxy to MCP server

agent = Agent(tools=tools + local_tools, provider=provider)

MCP Server — expose @tool functions as MCP-compliant endpoints:

from selectools.orchestration.mcp import MCPServer

server = MCPServer(tools=[search_tool, calculator_tool])
server.serve(host="0.0.0.0", port=8080)

Integration with Existing Systems¶

Observers: Graph run_id becomes each node's parent_run_id, creating a hierarchical trace tree
Guardrails: Per-node (each Agent's own guardrails) + graph-level (before first node, after last node)
Caching: Per-node via AgentConfig(cache=...)
Cost tracking: Aggregated across all nodes in GraphResult.total_usage
New StepType values: graph_node_start, graph_node_end, graph_routing, graph_checkpoint

Feature	Status	Impact	Effort
AgentGraph + GraphState	🟡 High	High	Large
Checkpointing	🟡 High	High	Medium
Parallel Nodes	🟡 Medium	High	Medium
SupervisorAgent	🟡 Medium	High	Medium
MCP Client	🟡 Medium	Medium	Medium
MCP Server	🟡 Low	Medium	Medium

v0.18.0: Connector Expansion¶

Focus: Close the integration gap with LangChain by adding high-demand document loaders, vector stores, and toolbox modules.

Current Inventory¶

Category	Count	Items
Document Loaders	4	text, file, directory, PDF
Vector Stores	4	Memory, SQLite, Chroma, Pinecone
Embedding Providers	4	OpenAI, Anthropic/Voyage, Gemini, Cohere
Toolbox	24 tools	file, web, data, datetime, text
Rerankers	2	Cohere, Jina

New Document Loaders¶

Add to src/selectools/rag/loaders.py as new static methods on DocumentLoader. Refactor to loaders/ subpackage with __init__.py re-exporting everything to support SaaS loaders as separate files.

Loader	Method	Dependencies	Complexity	Why it matters
CSV	`from_csv(path, content_columns, metadata_columns)`	stdlib `csv`	Small	Most common structured data format
JSON/JSONL	`from_json(path, text_field)` / `from_jsonl(...)`	stdlib `json`	Small	Standard for API responses, logs, datasets
HTML	`from_html(path_or_content, extract_text=True)`	`beautifulsoup4` (optional)	Small	Web scraping output, saved pages
URL	`from_url(url, timeout=30)`	`requests` + `beautifulsoup4`	Small	Direct URL-to-document (2nd most requested after PDF)
Markdown w/ Frontmatter	`from_markdown(path)`	`pyyaml` (optional)	Small	Static sites, docs, wikis
Google Drive	`from_google_drive(file_id, credentials)`	`google-api-python-client`	Medium	Most-used enterprise doc platform
Notion	`from_notion(page_id, api_key)`	`requests` (existing)	Medium	2nd most-requested SaaS loader
GitHub	`from_github(repo, path, branch, token)`	`requests` (existing)	Small	Developer docs and code
SQL Database	`from_sql(connection_string, query)`	`sqlalchemy` (optional)	Medium	Enterprise data in databases

New Vector Stores¶

New files in src/selectools/rag/stores/. Each follows the same pattern as chroma.py: inherit VectorStore, implement add_documents, search, delete, clear, lazy-import the dependency. Register in VectorStore.create() factory.

Store	File	Dependencies	Complexity	Why it matters
FAISS	`faiss.py`	`faiss-cpu`	Medium	De facto standard for local high-perf vector search (millions of vectors)
Qdrant	`qdrant.py`	`qdrant-client`	Medium	Fastest-growing vector DB, excellent filtering, cloud + self-hosted
pgvector	`pgvector.py`	`psycopg2-binary`	Medium	Use existing PostgreSQL — no new database needed
Weaviate	`weaviate.py`	`weaviate-client`	Medium	Popular cloud vector DB with GraphQL API
Redis Vector	`redis.py`	`redis` (existing)	Medium	Leverages existing Redis connection from `cache_redis.py`

New Toolbox Modules¶

New files in src/selectools/toolbox/. Follow @tool decorator pattern, register in get_all_tools() and get_tools_by_category().

Module	Tools	Dependencies	Complexity	Why it matters
`code_tools.py`	`execute_python`, `execute_shell`	stdlib `subprocess`	Medium	#1 most-used tool in agent frameworks
`search_tools.py`	`google_search`, `duckduckgo_search`	`duckduckgo_search` (optional)	Small-Medium	#2 most-used tool category
`github_tools.py`	`create_issue`, `list_issues`, `create_pr`, `get_file_contents`	`requests` (existing)	Medium	Developer workflow automation
`db_tools.py`	`query_database`, `list_tables`, `describe_table`	`sqlalchemy` (optional)	Medium	Enterprise data access

Dependency Management¶

All new dependencies are optional and lazy-imported. Add to pyproject.toml:

[project.optional-dependencies]
rag = [
    # existing deps ...
    "beautifulsoup4>=4.12.0",
    "faiss-cpu>=1.7.0",
    "qdrant-client>=1.7.0",
    "psycopg2-binary>=2.9.0",
    "weaviate-client>=4.0.0",
]

Individual stores/loaders remain installable a la carte: pip install selectools faiss-cpu works without the full [rag] group.

Feature	Status	Impact	Effort
CSV/JSON/JSONL Loaders	🟡 High	High	Small
HTML/URL Loaders	🟡 High	High	Small
FAISS Vector Store	🟡 High	High	Medium
Qdrant Vector Store	🟡 Medium	Medium	Medium
pgvector Store	🟡 Medium	High	Medium
Code Execution Tools	🟡 High	High	Medium
Search Tools	🟡 High	High	Small
SaaS Loaders	🟡 Medium	Medium	Medium
GitHub/DB Toolbox	🟡 Medium	Medium	Medium

v0.19.0: Ecosystem Parity¶

Focus: Observability, evaluation, REST API deployment, and templates — closing the gap with LangSmith, LangServe, and LangChain Hub.

Selectools Observe (Observability & Evaluation)¶

Existing head start: AgentObserver (15 events), AgentTrace (OTel export), AuditLogger (JSONL), AgentAnalytics.

New src/selectools/observe/ package:

src/selectools/observe/
    __init__.py          # Public exports
    trace_store.py       # TraceStore protocol + InMemory, SQLite, JSONL backends
    evaluators.py        # Evaluator protocol + built-in evaluators
    eval_runner.py       # EvalRunner for dataset evaluation
    export.py            # Export formatters (HTML, CSV, Datadog, Langfuse, OTel)
    datasets.py          # EvalCase, EvalDataset for test datasets

Trace Store¶

class TraceStore(Protocol):
    def save(self, trace: AgentTrace) -> str: ...
    def load(self, run_id: str) -> AgentTrace: ...
    def query(self, filters: TraceFilter) -> List[AgentTrace]: ...
    def list(self, limit: int, offset: int) -> List[AgentTraceSummary]: ...

Built-in: InMemoryTraceStore, SQLiteTraceStore, JSONLTraceStore.

Auto-collection via TraceCollectorObserver (implements AgentObserver, auto-saves traces on on_run_end).

Evaluators¶

class Evaluator(Protocol):
    def evaluate(self, input: str, output: str, reference: Optional[str] = None) -> EvalResult: ...

@dataclass
class EvalResult:
    score: float         # 0.0 to 1.0
    passed: bool
    reasoning: str
    evaluator: str

Built-in evaluators:

CorrectnessEvaluator — LLM-as-judge: is the output correct?
RelevanceEvaluator — LLM-as-judge: is the output relevant to the input?
FaithfulnessEvaluator — RAG-specific: is the output grounded in retrieved documents?
ToolUseEvaluator — did the agent use the right tools?
LatencyEvaluator — did the agent respond within the time budget?
CostEvaluator — did the agent stay within the cost budget?

LLM-based evaluators use the existing Provider protocol — any configured provider works.

Evaluation Runner¶

class EvalRunner:
    def run(self, agent: Agent, dataset: List[EvalCase], evaluators: List[Evaluator]) -> EvalReport: ...

Builds on agent.batch() for parallel evaluation.

Export Formats¶

export_to_html(traces) — self-contained HTML report with trace timeline (zero deps, open in browser)
export_to_csv(traces) — spreadsheet analysis
export_to_datadog(traces) — Datadog APM format
export_to_langfuse(traces) — open-source LangSmith alternative
export_to_otel(traces) — already exists via AgentTrace.to_otel_spans()

Key differentiator vs LangSmith: Zero SaaS dependency. Self-contained HTML trace viewer. Full evaluation without leaving your codebase.

Selectools Serve (REST API Deployment)¶

New src/selectools/serve/ package:

src/selectools/serve/
    __init__.py          # Public exports: AgentRouter, AgentBlueprint, playground
    fastapi.py           # FastAPI router
    flask.py             # Flask blueprint
    playground.py        # Self-contained chat UI + server
    models.py            # Pydantic request/response models

FastAPI Router¶

from selectools.serve import AgentRouter

router = AgentRouter(agent=my_agent, prefix="/agent")
# Creates:
#   POST /agent/invoke     — single prompt → AgentResult as JSON
#   POST /agent/batch      — multiple prompts → List[AgentResult]
#   POST /agent/stream     — single prompt → SSE stream
#   GET  /agent/schema     — OpenAPI schema for tools
#   GET  /agent/health     — health check

app = FastAPI()
app.include_router(router)

Flask Blueprint¶

from selectools.serve import AgentBlueprint

blueprint = AgentBlueprint(agent=my_agent, prefix="/agent")
app = Flask(__name__)
app.register_blueprint(blueprint)

Playground¶

from selectools.serve import playground

playground(agent=my_agent, port=8000)   # Chat UI at http://localhost:8000

Self-contained HTML page served by minimal HTTP server. Zero-dependency chat interface.

Key differentiator vs LangServe: Works with FastAPI AND Flask. Built-in playground with zero config.

Templates & Configuration¶

New src/selectools/templates/ package:

Template	Pre-configured with
`customer_support.py`	Support tools, system prompt, guardrails, topic restriction
`data_analyst.py`	Code execution, data tools, CSV/JSON structured output
`research_assistant.py`	Search tools, web tools, RAG pipeline
`code_reviewer.py`	File tools, GitHub tools, structured output
`rag_chatbot.py`	RAG pipeline, memory, knowledge base config

YAML Agent Configuration:

# agent.yaml
provider: openai
model: gpt-4o
tools:
  - selectools.toolbox.file_tools.read_file
  - selectools.toolbox.web_tools.http_get
  - ./my_custom_tool.py
system_prompt: "You are a helpful assistant..."
guardrails:
  input:
    - type: topic
      deny: [politics, religion]

from selectools.templates import from_yaml
agent = from_yaml("agent.yaml")

Feature	Status	Impact	Effort
Trace Store	🟡 High	High	Medium
Evaluators + EvalRunner	🟡 High	High	Medium
HTML Trace Export	🟡 Medium	High	Medium
FastAPI AgentRouter	🟡 High	High	Medium
Flask AgentBlueprint	🟡 Medium	Medium	Small
Playground	🟡 Medium	High	Medium
Agent Templates	🟡 Medium	Medium	Small
YAML Config	🟡 Low	Medium	Small

v0.20.0: Polish & Community¶

Focus: Niche integrations, community sharing, and developer experience polish.

Niche Document Loaders¶

Loader	Dependencies
`from_slack(channel_id, token)`	`requests`
`from_confluence(page_id, base_url, token)`	`requests`
`from_jira(project_key, token)`	`requests`
`from_discord(channel_id, token)`	`requests`
`from_email(imap_server, credentials)`	stdlib `imaplib`
`from_docx(path)`	`python-docx`
`from_excel(path, sheet)`	`openpyxl`
`from_xml(path, text_xpath)`	stdlib `xml.etree`

Niche Vector Stores¶

Store	Dependencies
`MilvusVectorStore`	`pymilvus`
`OpenSearchVectorStore`	`opensearch-py`
`LanceVectorStore`	`lancedb`

Niche Toolbox Modules¶

Module	Tools
`email_tools.py`	`send_email`, `read_inbox`, `search_emails`
`calendar_tools.py`	`create_event`, `list_events`, `find_free_slots`
`browser_tools.py`	`navigate`, `click`, `extract_text`, `screenshot` (Playwright)
`financial_tools.py`	`stock_price`, `exchange_rate`, `market_summary`

Community Features¶

Tool Marketplace/Registry: Publish and discover community @tool functions
Visual Agent Builder: Web UI for designing agent configurations (generates YAML)
Enhanced HTML Trace Viewer: Interactive timeline with filter, search, cost breakdown

Feature	Status	Impact	Effort
Niche Loaders (8)	🟡 Low	Medium	Medium
Niche Stores (3)	🟡 Low	Low	Medium
Niche Toolbox (4 modules)	🟡 Low	Medium	Medium
Tool Marketplace	🟡 Low	High	Large
Visual Agent Builder	🟡 Low	Medium	Large

Backlog (Unscheduled)¶

Feature	Notes
Tool Composition	`@compose` decorator for chaining tools
Universal Vision Support	Unified vision API across providers
AWS Bedrock Provider	VPC-native model access (Claude, Llama, Mistral)
Rate Limiting & Quotas	Per-tool and per-user quotas
Structured AgentConfig	Group 41 fields into nested dataclasses (RetryConfig, CoherenceConfig, etc.) with backward compat
Enhanced Testing Framework	Snapshot testing, load tests
Documentation Generation	Auto-generate docs from tool definitions
Prompt Optimization	Automatic prompt compression
CRM & Business Tools	HubSpot, Salesforce integrations

Release History¶

v0.16.1 - Consolidation & Hardening¶

✅ 6 bug fixes: arun() tool_usage stats, astream() dead code after yield, memory.from_dict() boundary fix, EntityMemory thread safety, KnowledgeMemory thread safety, SQLiteTripleStore WAL mode
✅ mypy: Resolved all 5 type errors across sessions.py and knowledge_graph.py (0 errors now)
✅ 68 new tests (total: 1487): Redis sessions, edge cases, memory boundary, memory integration, async memory, consolidation regression

v0.16.0 - Memory & Persistence¶

✅ Persistent Sessions: SessionStore protocol with 3 backends (JSON file, SQLite, Redis), TTL expiry, auto-save/load via AgentConfig
✅ Summarize-on-Trim: LLM-generated summaries of trimmed messages, injected as system context; configurable provider/model
✅ Entity Memory: LLM-based entity extraction (person, org, project, location, date, custom), LRU-pruned registry, system prompt injection
✅ Knowledge Graph Memory: Triple extraction (subject, relation, object), in-memory and SQLite storage, keyword-based querying
✅ Cross-Session Knowledge: Daily log files + persistent MEMORY.md, auto-registered remember tool, system prompt injection
✅ Memory Tools: Built-in remember tool auto-registered when knowledge_memory is configured
✅ 4 new observer events: on_session_load, on_session_save, on_memory_summarize, on_entity_extraction (total: 19)
✅ 5 new trace step types: session_load, session_save, memory_summarize, entity_extraction, kg_extraction
✅ 182 new tests (total: 1365)

v0.15.0 - Enterprise Reliability¶

✅ Guardrails Engine: GuardrailsPipeline with 5 built-in guardrails (Topic, PII, Toxicity, Format, Length) and block/rewrite/warn actions
✅ Audit Logging: JSONL AuditLogger with 4 privacy levels, daily rotation, thread-safe writes
✅ Tool Output Screening: 15 prompt injection patterns, per-tool screen_output=True or global via config
✅ Coherence Checking: LLM-based intent verification before each tool execution
✅ 83 new tests (total: 1183)

v0.14.1 - Streaming & Provider Fixes¶

✅ 13 streaming bug fixes: All providers' stream()/astream() now pass tools and yield ToolCall objects
✅ Agent core fixes: _streaming_call/_astreaming_call pass tools and don't stringify ToolCall objects
✅ Ollama _format_messages: Correct TOOL role mapping and ASSISTANT tool_calls inclusion
✅ FallbackProvider astream(): Error handling, failover, and circuit breaker support
✅ 141 new tests (total: 1100): Regression tests, recording-provider tests, unit tests for 6 previously untested modules

v0.14.0 - AgentObserver Protocol & Production Hardening¶

✅ AgentObserver Protocol: 15 lifecycle events with run_id/call_id correlation
✅ LoggingObserver: Structured JSON logs for ELK/Datadog
✅ OTel Span Export: AgentTrace.to_otel_spans() for OpenTelemetry
✅ Model Registry Update: 145 models with March 2026 pricing (GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro)
✅ OpenAI max_completion_tokens: Auto-detection for GPT-5.x, GPT-4.1, o-series models
✅ 11 bug fixes: Structured output parser bypass, policy bypass in parallel execution, memory trim observer gap, infinite recursion in batch+fallback, async policy timeout, None content handling, and more

v0.13.0 - Structured Output, Observability & Safety¶

✅ Structured Output Parsers: Pydantic / JSON Schema response_format on run() / arun() / ask() with auto-retry
✅ Execution Traces: result.trace with TraceStep timeline (llm_call, tool_selection, tool_execution, error)
✅ Reasoning Visibility: result.reasoning and result.reasoning_history extracted from LLM responses
✅ Provider Fallback Chain: FallbackProvider with circuit breaker and on_fallback callback
✅ Batch Processing: agent.batch() / agent.abatch() with max_concurrency and per-request error isolation
✅ Tool-Pair-Aware Trimming: ConversationMemory preserves tool_use/tool_result pairs during sliding window trim
✅ Tool Policy Engine: ToolPolicy with glob-based allow/review/deny rules and argument-level conditions
✅ Human-in-the-Loop Approval: confirm_action callback for review tools with approval_timeout

v0.12.x - Hybrid Search, Reranking, Advanced Chunking & Dynamic Tools¶

✅ BM25: Pure-Python Okapi BM25 keyword search; configurable k1/b; stop word removal; zero dependencies
✅ HybridSearcher: Vector + BM25 fusion via RRF or weighted linear combination
✅ HybridSearchTool: Agent-ready @tool with source attribution and score thresholds
✅ FusionMethod: RRF (rank-based) and WEIGHTED (normalised score) strategies
✅ Reranker ABC: Protocol for cross-encoder reranking with rerank(query, results, top_k)
✅ CohereReranker: Cohere Rerank API v2 (rerank-v3.5 default)
✅ JinaReranker: Jina AI Rerank API (jina-reranker-v2-base-multilingual default)
✅ HybridSearcher integration: Optional reranker= param for post-fusion re-scoring
✅ SemanticChunker: Embedding-based topic-boundary splitting; cosine similarity threshold
✅ ContextualChunker: LLM-generated context prepended to each chunk (Anthropic-style contextual retrieval)
✅ ToolLoader: Discover @tool functions from modules, files, and directories; hot-reload support
✅ Agent dynamic tools: add_tool, add_tools, remove_tool, replace_tool with prompt rebuild

v0.12.0 - Response Caching¶

✅ InMemoryCache: Thread-safe LRU + TTL cache with OrderedDict; zero dependencies
✅ RedisCache: Distributed TTL cache for multi-process deployments (optional redis dep)
✅ CacheKeyBuilder: Deterministic SHA-256 keys from (model, prompt, messages, tools, temperature)
✅ Agent Integration: AgentConfig(cache=...) checks cache before every provider call

v0.11.0 - Streaming & Parallel Execution¶

✅ E2E Streaming: Native tool streaming via Agent.astream with Union[str, ToolCall] provider protocol
✅ Parallel Tool Execution: asyncio.gather for async, ThreadPoolExecutor for sync; enabled by default
✅ Full Type Safety: 0 mypy errors across 80+ source and test files

v0.10.0 - Critical Architecture¶

✅ Native Function Calling: OpenAI, Anthropic, and Gemini native tool APIs
✅ Context Propagation: contextvars.copy_context() for async tool execution
✅ Routing Mode: AgentConfig(routing_only=True) for classification without execution

v0.9.0 - Core Capabilities & Reliability¶

✅ Custom System Prompt: AgentConfig(system_prompt=...) for domain instructions
✅ Structured AgentResult: run() returns AgentResult with tool calls, args, and iterations
✅ Reusable Agent Instances: Agent.reset() clears history/memory for clean reuse

v0.8.0 - Embeddings & RAG¶

✅ Full RAG Stack: VectorStore (Memory/SQLite/Chroma), Embeddings (OpenAI/Gemini), Document Loaders
✅ RAG Tools: RAGTool and SemanticSearchTool for knowledge base queries

v0.6.0 - High-Impact Features¶

✅ Observability Hooks: on_agent_start, on_tool_end lifecycle events
✅ Streaming Tools: Generators yield results progressively

v0.5.0 - Production Readiness¶

✅ Cost Tracking: Token counting and USD estimation
✅ Better Errors: PyTorch-style error messages with suggestions