Selectools Development Roadmap¶
Status Legend
- ✅ Implemented - Merged and available in latest release
- 🔵 In Progress - Actively being worked on
- 🟡 Planned - Scheduled for implementation
- ⏸️ Deferred - Postponed to later release
- ❌ Cancelled - No longer planned
v0.16.0: Memory & Persistence¶
Focus: Durable conversation state, cross-session knowledge, and advanced memory strategies.
Persistent Conversation Sessions¶
Problem: ConversationMemory is in-memory only. Process restarts lose all history. Chat applications need sessions that survive restarts.
What it does: SessionStore protocol with pluggable backends. Sessions auto-persist after each turn with TTL-based expiry.
API:
from selectools.memory import JsonFileSessionStore
store = JsonFileSessionStore(directory="./sessions")
agent = Agent(
tools=[...], provider=provider,
config=AgentConfig(session_store=store, session_id="user-123"),
)
result = agent.ask("What was my last question?") # auto-persisted
Scope:
SessionStoreprotocol:save(),load(),list(),delete()- Three backends: JSON file, SQLite, Redis
- Auto-save after each turn
- TTL-based expiry
- Tool-pair-preserving trim on load
Touches: New sessions.py, AgentConfig, agent/core.py.
Summarize-on-Trim¶
Problem: Old messages are silently dropped when history exceeds limits. Important early context is lost.
What it does: Before trimming, summarize the messages being removed and inject the summary as a system-level context message.
API:
memory = ConversationMemory(
max_messages=30,
summarize_on_trim=True,
summarize_provider=provider,
)
Scope:
- LLM-generated 2-3 sentence summary of trimmed messages
- Summary injected as system message at conversation start
- Configurable summary model (use a cheap model like Haiku)
Touches: memory.py, provider integration.
Entity Memory¶
Problem: The agent can't track entities (people, orgs, projects) mentioned across turns. Each turn starts with no entity context.
What it does: Automatically extract named entities from conversation, maintain an entity registry, and inject relevant entity context into prompts.
API:
from selectools.memory import EntityMemory
memory = EntityMemory(provider=provider)
agent = Agent(tools=[...], provider=provider, memory=memory)
agent.ask("I'm working with Alice from Acme Corp on Project Alpha")
agent.ask("What project am I working on?")
# Agent knows: Alice (person, Acme Corp), Acme Corp (org), Project Alpha (project)
Scope:
- LLM-based entity extraction after each turn
- Entity types: person, organization, project, location, date, custom
- Entity registry: name → type, attributes, last mentioned
- System prompt injection of relevant entities
- Configurable: extraction model, max entities, relevance window
Touches: New entity_memory.py, PromptBuilder integration.
Knowledge Graph Memory¶
Problem: Entity memory tracks individual entities but not relationships between them. "Alice manages Project Alpha" is lost.
What it does: Build a graph of (subject, relation, object) triples from conversations. Query the graph to inject relevant relationship context into prompts.
API:
from selectools.memory import KnowledgeGraphMemory
memory = KnowledgeGraphMemory(provider=provider, storage="sqlite")
agent = Agent(tools=[...], provider=provider, memory=memory)
agent.ask("Alice manages Project Alpha and reports to Bob")
# Graph: (Alice, manages, Project Alpha), (Alice, reports_to, Bob)
agent.ask("Who manages Project Alpha?")
# Relevant triples injected: (Alice, manages, Project Alpha)
Scope:
- LLM-based triple extraction
- Storage: in-memory dict (default), SQLite (persistent)
- Query: retrieve triples relevant to current query via keyword + embedding match
- System prompt injection of relevant triples
- Graph operations: add, query, merge, prune
Touches: New knowledge_graph.py, storage backend, PromptBuilder.
Cross-Session Knowledge Memory¶
Problem: Even with persistent sessions, each session is isolated. There's no way for an agent to "remember" facts across conversations (e.g., user preferences, prior decisions).
What it does: A file-based or DB-backed knowledge memory with two layers: a daily log (append-only entries from the current day) and a long-term store (curated facts that persist indefinitely). A built-in remember tool lets the agent save facts explicitly. Relevant memories are auto-injected into the system prompt.
API:
from selectools.memory import KnowledgeMemory
knowledge = KnowledgeMemory(
directory="./workspace",
recent_days=2, # inject last 2 days into system prompt
max_context_chars=5000, # cap memory injection size
)
agent = Agent(
tools=[...],
provider=provider,
config=AgentConfig(knowledge_memory=knowledge),
)
Scope:
- Daily log files (
memory/YYYY-MM-DD.md) + persistentMEMORY.md - Built-in
remembertool: agent can save categorized facts - System prompt auto-injection of recent and long-term memories
- Configurable retention and context window
Touches: New knowledge.py module, PromptBuilder integration, built-in tool in toolbox/.
| Feature | Priority | Impact | Effort |
|---|---|---|---|
| Persistent Conversation Sessions | ✅ Done | High | Medium |
| Summarize-on-Trim | ✅ Done | Medium | Small |
| Cross-Session Knowledge Memory | ✅ Done | Medium | Medium |
| Entity Memory | ✅ Done | High | Medium |
| Knowledge Graph Memory | ✅ Done | High | Large |
Implementation Order¶
v0.13.0 ✅ Structured Output + Safety Foundation (Complete)
Tool-pair trimming → Structured output → Execution traces → Reasoning
→ Fallback providers → Batch → Tool policy → Human-in-the-loop
v0.14.0 ✅ AgentObserver Protocol + Production Hardening (Complete)
AgentObserver (15 events) → LoggingObserver → OTel export
→ Model registry (145 models) → max_completion_tokens fix → 11 bug fixes
v0.14.1 ✅ Streaming & Provider Fixes (Complete)
13 streaming bug fixes → 141 new tests → Unit tests for 6 untested modules
v0.15.0 ✅ Enterprise Reliability (Complete)
Guardrails engine → Audit logging → Tool output screening → Coherence checking
v0.16.0 ✅ Memory & Persistence (Complete)
Sessions → Summarize-on-trim → Knowledge memory → Entity memory → KG memory
v0.16.5 ✅ Design Patterns & Code Quality
StepType/ModelType enums → Agent mixin split → Provider base class → Terminal actions
→ Async observers → Gemini 3.x thought signatures → Hooks deprecation → ADRs
v0.17.0 🟡 Multi-Agent Orchestration
AgentGraph → GraphState → Checkpointing → Parallel nodes → SupervisorAgent → MCP
v0.18.0 🟡 Connector Expansion
CSV/JSON/HTML/URL/Markdown loaders → FAISS/Qdrant/pgvector stores
→ Code/search tools → SaaS loaders → GitHub/DB toolbox
v0.19.0 🟡 Ecosystem Parity
Observe (trace store, evaluators, export) → Serve (FastAPI, Flask, playground)
→ Templates → YAML config
v0.20.0 🟡 Polish & Community
HTML trace viewer → Niche integrations → Tool marketplace
v0.16.5: Design Patterns & Code Quality ✅¶
Focus: Structural refactoring to prevent the class of bugs found in v0.16.0–v0.16.4 and prepare a clean foundation for v0.17.0 multi-agent orchestration.
Highlights¶
- Agent decomposed into 4 mixins —
core.py3128→1448 lines (-54%) - StepType + ModelType converted to
str, Enumfor type safety - Terminal actions —
@tool(terminal=True)andstop_conditioncallback - AsyncAgentObserver — async lifecycle hooks with
blockingflag - Gemini 3.x thought signatures —
ToolCall.thought_signaturefield - Hooks deprecated — wrapped via
_HooksAdapter, single observer pipeline - OpenAI/Ollama share base class — Template Method pattern (-75% duplication)
- 163 new tests (total: 1640), 53 architecture fitness tests
- 6 ADRs in
docs/decisions/
See CHANGELOG for full details.
v0.16.6: Gemini thought_signature crash fix ✅¶
Fixed UnicodeDecodeError when Gemini 3.x returns non-UTF-8 binary thought_signature. Replaced UTF-8 encode/decode with base64 across all 5 affected locations. 2 new regression tests (total: 1642).
v0.16.7: Cleanup ✅¶
Removed unused CLI module (cli.py + console script entry point). Completed README example table (28-38). Fixed stale counts across CONTRIBUTING.md, ARCHITECTURE.md, QUICKSTART.md, AGENT.md. Total: 1620 tests.
v0.17.0: Multi-Agent Orchestration¶
Focus: DAG-based multi-agent workflows that are simpler and more Pythonic than LangGraph.
Design Philosophy¶
LangGraph requires learning StateGraph, MessageAnnotation, Pregel channels, and a custom checkpointing API before building anything. Selectools takes the opposite approach: agents are the primitive, composition is plain Python. An AgentGraph should feel like writing normal Python with async/await, not configuring a data pipeline.
Core principles:
- Agents are nodes, not functions — each node is a full
Agentinstance with its own tools, provider, and config, reusing all existing infrastructure (traces, observers, guardrails, policies) - Edges are just Python functions — no special
ConditionalEdgeclass; a routing function takes the result and returns the next node name via plainif/elif/else - State is a typed dataclass — no Pydantic models for state; just a
@dataclassthat gets passed between nodes - Checkpointing is serialization — the state is JSON-serializable; checkpoint stores implement a 3-method protocol
- HITL reuses existing patterns — the existing
ToolPolicy+confirm_actionpattern already handles human-in-the-loop
Module Structure¶
src/selectools/orchestration/
__init__.py # Public exports: AgentGraph, GraphNode, GraphState, GraphResult
graph.py # AgentGraph: the DAG-based orchestration engine
node.py # GraphNode: wraps Agent with input/output transforms
state.py # GraphState: typed state container with merge semantics
checkpoint.py # CheckpointStore protocol + InMemory, File, SQLite backends
supervisor.py # SupervisorAgent: meta-agent for task decomposition
mcp.py # MCP client/server integration
Core Abstractions¶
GraphState¶
@dataclass
class GraphState:
messages: List[Message] # Accumulated messages across nodes
data: Dict[str, Any] # Arbitrary key-value store for inter-node communication
current_node: str # Name of the currently executing node
history: List[Tuple[str, AgentResult]] # Ordered list of (node_name, result) pairs
metadata: Dict[str, Any] # User-attached metadata (carried through checkpoints)
Intentionally flat and JSON-serializable. No Pydantic, no custom descriptors, no annotation magic.
GraphNode¶
@dataclass
class GraphNode:
name: str
agent: Union[Agent, Callable[[GraphState], GraphState]]
input_transform: Optional[Callable[[GraphState], List[Message]]] = None # state → messages for Agent.run()
output_transform: Optional[Callable[[AgentResult, GraphState], GraphState]] = None # merge result into state
max_iterations: int = 1 # How many times this node can run in a cycle
If input_transform/output_transform are not provided, sensible defaults are used (last message becomes user message; result appends to state messages).
AgentGraph¶
class AgentGraph:
"""DAG-based multi-agent orchestration engine."""
# Constants
END: ClassVar[str] = "__end__"
# Node management
def add_node(self, name: str, agent: Union[Agent, Callable], **kwargs) -> None: ...
def add_edge(self, from_node: str, to_node: str) -> None: ...
def add_conditional_edge(self, from_node: str, router_fn: Callable[[GraphState], str]) -> None: ...
def add_parallel_nodes(self, name: str, node_names: List[str], merge_fn: Optional[Callable] = None) -> None: ...
def set_entry(self, node_name: str) -> None: ...
# Execution
def run(self, prompt_or_state: Union[str, GraphState], checkpoint_store: Optional[CheckpointStore] = None) -> GraphResult: ...
def arun(self, prompt_or_state: ..., checkpoint_store: ...) -> GraphResult: ...
async def astream(self, prompt_or_state: ...) -> AsyncGenerator[GraphEvent, None]: ...
# Validation
def validate(self) -> List[str]: ... # Returns list of warnings/errors
Usage example:
from selectools.orchestration import AgentGraph
graph = AgentGraph()
graph.add_node("planner", planner_agent)
graph.add_node("researcher", researcher_agent)
graph.add_node("writer", writer_agent)
graph.add_edge("planner", "researcher")
graph.add_conditional_edge("researcher", lambda state: "writer" if state.data.get("ready") else "researcher")
graph.add_edge("writer", AgentGraph.END)
graph.set_entry("planner")
result = graph.run("Write a blog post about AI agents")
GraphResult¶
@dataclass
class GraphResult:
content: str # Final output text
state: GraphState # Final state
node_results: Dict[str, List[AgentResult]] # Per-node results
trace: AgentTrace # Composite trace (linked via parent_run_id)
total_usage: UsageStats # Aggregated cost/tokens across all nodes
How It Beats LangGraph¶
| LangGraph | Selectools AgentGraph | Why better |
|---|---|---|
Custom StateGraph with Annotated[list, add_messages] | Plain GraphState dataclass | No custom type system to learn |
conditionalEdges with special return constants | Plain Python function returning a string | Debuggable, testable, IDE-friendly |
| Pregel channels for state management | Dict[str, Any] with merge functions | Standard Python data structures |
Separate compile() step before execution | Validate + run in one step | No compilation phase, faster iteration |
MemorySaver / SqliteSaver / PostgresSaver | CheckpointStore protocol (3 methods) | Trivial to implement custom stores |
| Node functions receive raw state | Nodes are full Agent instances | Inherit all Agent features: tools, traces, observers, guardrails |
| Complex interrupt/resume for human-in-the-loop | Reuse existing confirm_action on AgentConfig | Zero new concepts for HITL |
Sub-graphs require CompiledGraph nesting | AgentGraph can be a node in another AgentGraph | Natural composition via duck typing |
Checkpointing¶
class CheckpointStore(Protocol):
def save(self, graph_id: str, state: GraphState, step: int) -> str: ... # Returns checkpoint_id
def load(self, checkpoint_id: str) -> Tuple[GraphState, int]: ... # Returns (state, step)
def list(self, graph_id: str) -> List[str]: ... # List checkpoint_ids
Built-in implementations:
InMemoryCheckpointStore— dict-based, for developmentFileCheckpointStore— JSON files, for single-process productionSQLiteCheckpointStore— for multi-process production
Enables: resume after crash, HITL pause/resume, time travel debugging.
Parallel Execution¶
graph.add_parallel_nodes("research_step", ["researcher_a", "researcher_b", "researcher_c"])
graph.add_edge("research_step", "synthesizer")
Uses asyncio.gather() (async) or ThreadPoolExecutor (sync) — same pattern already in agent/core.py for parallel tool execution. Configurable merge_fn(List[GraphState]) -> GraphState (default: concatenate messages, shallow-merge data dicts).
Sub-Graphs and Composition¶
An AgentGraph satisfies the Agent-like interface, so it can be a node in another graph:
research_subgraph = AgentGraph()
# ... define nodes ...
main_graph = AgentGraph()
main_graph.add_node("research", research_subgraph) # Sub-graph as a node
main_graph.add_node("writer", writer_agent)
main_graph.add_edge("research", "writer")
Sub-graph traces are linked via parent_run_id (already supported in trace.py).
SupervisorAgent¶
Higher-level abstraction for common multi-agent patterns:
from selectools.orchestration import SupervisorAgent
supervisor = SupervisorAgent(
agents={"researcher": researcher, "writer": writer, "reviewer": reviewer},
provider=OpenAIProvider(),
strategy="plan_and_execute", # or "round_robin", "dynamic"
)
result = supervisor.run("Write a comprehensive blog post about AI safety")
The supervisor uses the provider LLM to decompose tasks and route to specialist agents. Internally builds and executes an AgentGraph.
MCP Integration¶
MCP Client — discover and call tools from MCP-compliant servers:
from selectools.orchestration.mcp import MCPClient, mcp_tools
client = MCPClient(server_url="http://localhost:8080")
tools = mcp_tools(client) # Returns List[Tool] that proxy to MCP server
agent = Agent(tools=tools + local_tools, provider=provider)
MCP Server — expose @tool functions as MCP-compliant endpoints:
from selectools.orchestration.mcp import MCPServer
server = MCPServer(tools=[search_tool, calculator_tool])
server.serve(host="0.0.0.0", port=8080)
Integration with Existing Systems¶
- Observers: Graph
run_idbecomes each node'sparent_run_id, creating a hierarchical trace tree - Guardrails: Per-node (each Agent's own guardrails) + graph-level (before first node, after last node)
- Caching: Per-node via
AgentConfig(cache=...) - Cost tracking: Aggregated across all nodes in
GraphResult.total_usage - New StepType values:
graph_node_start,graph_node_end,graph_routing,graph_checkpoint
| Feature | Status | Impact | Effort |
|---|---|---|---|
| AgentGraph + GraphState | 🟡 High | High | Large |
| Checkpointing | 🟡 High | High | Medium |
| Parallel Nodes | 🟡 Medium | High | Medium |
| SupervisorAgent | 🟡 Medium | High | Medium |
| MCP Client | 🟡 Medium | Medium | Medium |
| MCP Server | 🟡 Low | Medium | Medium |
v0.18.0: Connector Expansion¶
Focus: Close the integration gap with LangChain by adding high-demand document loaders, vector stores, and toolbox modules.
Current Inventory¶
| Category | Count | Items |
|---|---|---|
| Document Loaders | 4 | text, file, directory, PDF |
| Vector Stores | 4 | Memory, SQLite, Chroma, Pinecone |
| Embedding Providers | 4 | OpenAI, Anthropic/Voyage, Gemini, Cohere |
| Toolbox | 24 tools | file, web, data, datetime, text |
| Rerankers | 2 | Cohere, Jina |
New Document Loaders¶
Add to src/selectools/rag/loaders.py as new static methods on DocumentLoader. Refactor to loaders/ subpackage with __init__.py re-exporting everything to support SaaS loaders as separate files.
| Loader | Method | Dependencies | Complexity | Why it matters |
|---|---|---|---|---|
| CSV | from_csv(path, content_columns, metadata_columns) | stdlib csv | Small | Most common structured data format |
| JSON/JSONL | from_json(path, text_field) / from_jsonl(...) | stdlib json | Small | Standard for API responses, logs, datasets |
| HTML | from_html(path_or_content, extract_text=True) | beautifulsoup4 (optional) | Small | Web scraping output, saved pages |
| URL | from_url(url, timeout=30) | requests + beautifulsoup4 | Small | Direct URL-to-document (2nd most requested after PDF) |
| Markdown w/ Frontmatter | from_markdown(path) | pyyaml (optional) | Small | Static sites, docs, wikis |
| Google Drive | from_google_drive(file_id, credentials) | google-api-python-client | Medium | Most-used enterprise doc platform |
| Notion | from_notion(page_id, api_key) | requests (existing) | Medium | 2nd most-requested SaaS loader |
| GitHub | from_github(repo, path, branch, token) | requests (existing) | Small | Developer docs and code |
| SQL Database | from_sql(connection_string, query) | sqlalchemy (optional) | Medium | Enterprise data in databases |
New Vector Stores¶
New files in src/selectools/rag/stores/. Each follows the same pattern as chroma.py: inherit VectorStore, implement add_documents, search, delete, clear, lazy-import the dependency. Register in VectorStore.create() factory.
| Store | File | Dependencies | Complexity | Why it matters |
|---|---|---|---|---|
| FAISS | faiss.py | faiss-cpu | Medium | De facto standard for local high-perf vector search (millions of vectors) |
| Qdrant | qdrant.py | qdrant-client | Medium | Fastest-growing vector DB, excellent filtering, cloud + self-hosted |
| pgvector | pgvector.py | psycopg2-binary | Medium | Use existing PostgreSQL — no new database needed |
| Weaviate | weaviate.py | weaviate-client | Medium | Popular cloud vector DB with GraphQL API |
| Redis Vector | redis.py | redis (existing) | Medium | Leverages existing Redis connection from cache_redis.py |
New Toolbox Modules¶
New files in src/selectools/toolbox/. Follow @tool decorator pattern, register in get_all_tools() and get_tools_by_category().
| Module | Tools | Dependencies | Complexity | Why it matters |
|---|---|---|---|---|
code_tools.py | execute_python, execute_shell | stdlib subprocess | Medium | #1 most-used tool in agent frameworks |
search_tools.py | google_search, duckduckgo_search | duckduckgo_search (optional) | Small-Medium | #2 most-used tool category |
github_tools.py | create_issue, list_issues, create_pr, get_file_contents | requests (existing) | Medium | Developer workflow automation |
db_tools.py | query_database, list_tables, describe_table | sqlalchemy (optional) | Medium | Enterprise data access |
Dependency Management¶
All new dependencies are optional and lazy-imported. Add to pyproject.toml:
[project.optional-dependencies]
rag = [
# existing deps ...
"beautifulsoup4>=4.12.0",
"faiss-cpu>=1.7.0",
"qdrant-client>=1.7.0",
"psycopg2-binary>=2.9.0",
"weaviate-client>=4.0.0",
]
Individual stores/loaders remain installable a la carte: pip install selectools faiss-cpu works without the full [rag] group.
| Feature | Status | Impact | Effort |
|---|---|---|---|
| CSV/JSON/JSONL Loaders | 🟡 High | High | Small |
| HTML/URL Loaders | 🟡 High | High | Small |
| FAISS Vector Store | 🟡 High | High | Medium |
| Qdrant Vector Store | 🟡 Medium | Medium | Medium |
| pgvector Store | 🟡 Medium | High | Medium |
| Code Execution Tools | 🟡 High | High | Medium |
| Search Tools | 🟡 High | High | Small |
| SaaS Loaders | 🟡 Medium | Medium | Medium |
| GitHub/DB Toolbox | 🟡 Medium | Medium | Medium |
v0.19.0: Ecosystem Parity¶
Focus: Observability, evaluation, REST API deployment, and templates — closing the gap with LangSmith, LangServe, and LangChain Hub.
Selectools Observe (Observability & Evaluation)¶
Existing head start: AgentObserver (15 events), AgentTrace (OTel export), AuditLogger (JSONL), AgentAnalytics.
New src/selectools/observe/ package:
src/selectools/observe/
__init__.py # Public exports
trace_store.py # TraceStore protocol + InMemory, SQLite, JSONL backends
evaluators.py # Evaluator protocol + built-in evaluators
eval_runner.py # EvalRunner for dataset evaluation
export.py # Export formatters (HTML, CSV, Datadog, Langfuse, OTel)
datasets.py # EvalCase, EvalDataset for test datasets
Trace Store¶
class TraceStore(Protocol):
def save(self, trace: AgentTrace) -> str: ...
def load(self, run_id: str) -> AgentTrace: ...
def query(self, filters: TraceFilter) -> List[AgentTrace]: ...
def list(self, limit: int, offset: int) -> List[AgentTraceSummary]: ...
Built-in: InMemoryTraceStore, SQLiteTraceStore, JSONLTraceStore.
Auto-collection via TraceCollectorObserver (implements AgentObserver, auto-saves traces on on_run_end).
Evaluators¶
class Evaluator(Protocol):
def evaluate(self, input: str, output: str, reference: Optional[str] = None) -> EvalResult: ...
@dataclass
class EvalResult:
score: float # 0.0 to 1.0
passed: bool
reasoning: str
evaluator: str
Built-in evaluators:
CorrectnessEvaluator— LLM-as-judge: is the output correct?RelevanceEvaluator— LLM-as-judge: is the output relevant to the input?FaithfulnessEvaluator— RAG-specific: is the output grounded in retrieved documents?ToolUseEvaluator— did the agent use the right tools?LatencyEvaluator— did the agent respond within the time budget?CostEvaluator— did the agent stay within the cost budget?
LLM-based evaluators use the existing Provider protocol — any configured provider works.
Evaluation Runner¶
class EvalRunner:
def run(self, agent: Agent, dataset: List[EvalCase], evaluators: List[Evaluator]) -> EvalReport: ...
Builds on agent.batch() for parallel evaluation.
Export Formats¶
export_to_html(traces)— self-contained HTML report with trace timeline (zero deps, open in browser)export_to_csv(traces)— spreadsheet analysisexport_to_datadog(traces)— Datadog APM formatexport_to_langfuse(traces)— open-source LangSmith alternativeexport_to_otel(traces)— already exists viaAgentTrace.to_otel_spans()
Key differentiator vs LangSmith: Zero SaaS dependency. Self-contained HTML trace viewer. Full evaluation without leaving your codebase.
Selectools Serve (REST API Deployment)¶
New src/selectools/serve/ package:
src/selectools/serve/
__init__.py # Public exports: AgentRouter, AgentBlueprint, playground
fastapi.py # FastAPI router
flask.py # Flask blueprint
playground.py # Self-contained chat UI + server
models.py # Pydantic request/response models
FastAPI Router¶
from selectools.serve import AgentRouter
router = AgentRouter(agent=my_agent, prefix="/agent")
# Creates:
# POST /agent/invoke — single prompt → AgentResult as JSON
# POST /agent/batch — multiple prompts → List[AgentResult]
# POST /agent/stream — single prompt → SSE stream
# GET /agent/schema — OpenAPI schema for tools
# GET /agent/health — health check
app = FastAPI()
app.include_router(router)
Flask Blueprint¶
from selectools.serve import AgentBlueprint
blueprint = AgentBlueprint(agent=my_agent, prefix="/agent")
app = Flask(__name__)
app.register_blueprint(blueprint)
Playground¶
from selectools.serve import playground
playground(agent=my_agent, port=8000) # Chat UI at http://localhost:8000
Self-contained HTML page served by minimal HTTP server. Zero-dependency chat interface.
Key differentiator vs LangServe: Works with FastAPI AND Flask. Built-in playground with zero config.
Templates & Configuration¶
New src/selectools/templates/ package:
| Template | Pre-configured with |
|---|---|
customer_support.py | Support tools, system prompt, guardrails, topic restriction |
data_analyst.py | Code execution, data tools, CSV/JSON structured output |
research_assistant.py | Search tools, web tools, RAG pipeline |
code_reviewer.py | File tools, GitHub tools, structured output |
rag_chatbot.py | RAG pipeline, memory, knowledge base config |
YAML Agent Configuration:
# agent.yaml
provider: openai
model: gpt-4o
tools:
- selectools.toolbox.file_tools.read_file
- selectools.toolbox.web_tools.http_get
- ./my_custom_tool.py
system_prompt: "You are a helpful assistant..."
guardrails:
input:
- type: topic
deny: [politics, religion]
| Feature | Status | Impact | Effort |
|---|---|---|---|
| Trace Store | 🟡 High | High | Medium |
| Evaluators + EvalRunner | 🟡 High | High | Medium |
| HTML Trace Export | 🟡 Medium | High | Medium |
| FastAPI AgentRouter | 🟡 High | High | Medium |
| Flask AgentBlueprint | 🟡 Medium | Medium | Small |
| Playground | 🟡 Medium | High | Medium |
| Agent Templates | 🟡 Medium | Medium | Small |
| YAML Config | 🟡 Low | Medium | Small |
v0.20.0: Polish & Community¶
Focus: Niche integrations, community sharing, and developer experience polish.
Niche Document Loaders¶
| Loader | Dependencies |
|---|---|
from_slack(channel_id, token) | requests |
from_confluence(page_id, base_url, token) | requests |
from_jira(project_key, token) | requests |
from_discord(channel_id, token) | requests |
from_email(imap_server, credentials) | stdlib imaplib |
from_docx(path) | python-docx |
from_excel(path, sheet) | openpyxl |
from_xml(path, text_xpath) | stdlib xml.etree |
Niche Vector Stores¶
| Store | Dependencies |
|---|---|
MilvusVectorStore | pymilvus |
OpenSearchVectorStore | opensearch-py |
LanceVectorStore | lancedb |
Niche Toolbox Modules¶
| Module | Tools |
|---|---|
email_tools.py | send_email, read_inbox, search_emails |
calendar_tools.py | create_event, list_events, find_free_slots |
browser_tools.py | navigate, click, extract_text, screenshot (Playwright) |
financial_tools.py | stock_price, exchange_rate, market_summary |
Community Features¶
- Tool Marketplace/Registry: Publish and discover community
@toolfunctions - Visual Agent Builder: Web UI for designing agent configurations (generates YAML)
- Enhanced HTML Trace Viewer: Interactive timeline with filter, search, cost breakdown
| Feature | Status | Impact | Effort |
|---|---|---|---|
| Niche Loaders (8) | 🟡 Low | Medium | Medium |
| Niche Stores (3) | 🟡 Low | Low | Medium |
| Niche Toolbox (4 modules) | 🟡 Low | Medium | Medium |
| Tool Marketplace | 🟡 Low | High | Large |
| Visual Agent Builder | 🟡 Low | Medium | Large |
Backlog (Unscheduled)¶
| Feature | Notes |
|---|---|
| Tool Composition | @compose decorator for chaining tools |
| Universal Vision Support | Unified vision API across providers |
| AWS Bedrock Provider | VPC-native model access (Claude, Llama, Mistral) |
| Rate Limiting & Quotas | Per-tool and per-user quotas |
| Structured AgentConfig | Group 41 fields into nested dataclasses (RetryConfig, CoherenceConfig, etc.) with backward compat |
| Enhanced Testing Framework | Snapshot testing, load tests |
| Documentation Generation | Auto-generate docs from tool definitions |
| Prompt Optimization | Automatic prompt compression |
| CRM & Business Tools | HubSpot, Salesforce integrations |
Release History¶
v0.16.1 - Consolidation & Hardening¶
- ✅ 6 bug fixes:
arun()tool_usage stats,astream()dead code after yield,memory.from_dict()boundary fix,EntityMemorythread safety,KnowledgeMemorythread safety,SQLiteTripleStoreWAL mode - ✅ mypy: Resolved all 5 type errors across
sessions.pyandknowledge_graph.py(0 errors now) - ✅ 68 new tests (total: 1487): Redis sessions, edge cases, memory boundary, memory integration, async memory, consolidation regression
v0.16.0 - Memory & Persistence¶
- ✅ Persistent Sessions:
SessionStoreprotocol with 3 backends (JSON file, SQLite, Redis), TTL expiry, auto-save/load viaAgentConfig - ✅ Summarize-on-Trim: LLM-generated summaries of trimmed messages, injected as system context; configurable provider/model
- ✅ Entity Memory: LLM-based entity extraction (person, org, project, location, date, custom), LRU-pruned registry, system prompt injection
- ✅ Knowledge Graph Memory: Triple extraction (subject, relation, object), in-memory and SQLite storage, keyword-based querying
- ✅ Cross-Session Knowledge: Daily log files + persistent
MEMORY.md, auto-registeredremembertool, system prompt injection - ✅ Memory Tools: Built-in
remembertool auto-registered whenknowledge_memoryis configured - ✅ 4 new observer events:
on_session_load,on_session_save,on_memory_summarize,on_entity_extraction(total: 19) - ✅ 5 new trace step types:
session_load,session_save,memory_summarize,entity_extraction,kg_extraction - ✅ 182 new tests (total: 1365)
v0.15.0 - Enterprise Reliability¶
- ✅ Guardrails Engine:
GuardrailsPipelinewith 5 built-in guardrails (Topic, PII, Toxicity, Format, Length) and block/rewrite/warn actions - ✅ Audit Logging: JSONL
AuditLoggerwith 4 privacy levels, daily rotation, thread-safe writes - ✅ Tool Output Screening: 15 prompt injection patterns, per-tool
screen_output=Trueor global via config - ✅ Coherence Checking: LLM-based intent verification before each tool execution
- ✅ 83 new tests (total: 1183)
v0.14.1 - Streaming & Provider Fixes¶
- ✅ 13 streaming bug fixes: All providers'
stream()/astream()now passtoolsand yieldToolCallobjects - ✅ Agent core fixes:
_streaming_call/_astreaming_callpass tools and don't stringifyToolCallobjects - ✅ Ollama
_format_messages: CorrectTOOLrole mapping andASSISTANTtool_calls inclusion - ✅ FallbackProvider
astream(): Error handling, failover, and circuit breaker support - ✅ 141 new tests (total: 1100): Regression tests, recording-provider tests, unit tests for 6 previously untested modules
v0.14.0 - AgentObserver Protocol & Production Hardening¶
- ✅ AgentObserver Protocol: 15 lifecycle events with
run_id/call_idcorrelation - ✅ LoggingObserver: Structured JSON logs for ELK/Datadog
- ✅ OTel Span Export:
AgentTrace.to_otel_spans()for OpenTelemetry - ✅ Model Registry Update: 145 models with March 2026 pricing (GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro)
- ✅ OpenAI
max_completion_tokens: Auto-detection for GPT-5.x, GPT-4.1, o-series models - ✅ 11 bug fixes: Structured output parser bypass, policy bypass in parallel execution, memory trim observer gap, infinite recursion in batch+fallback, async policy timeout, None content handling, and more
v0.13.0 - Structured Output, Observability & Safety¶
- ✅ Structured Output Parsers: Pydantic / JSON Schema
response_formatonrun()/arun()/ask()with auto-retry - ✅ Execution Traces:
result.tracewithTraceSteptimeline (llm_call,tool_selection,tool_execution,error) - ✅ Reasoning Visibility:
result.reasoningandresult.reasoning_historyextracted from LLM responses - ✅ Provider Fallback Chain:
FallbackProviderwith circuit breaker andon_fallbackcallback - ✅ Batch Processing:
agent.batch()/agent.abatch()withmax_concurrencyand per-request error isolation - ✅ Tool-Pair-Aware Trimming:
ConversationMemorypreserves tool_use/tool_result pairs during sliding window trim - ✅ Tool Policy Engine:
ToolPolicywith glob-based allow/review/deny rules and argument-level conditions - ✅ Human-in-the-Loop Approval:
confirm_actioncallback forreviewtools withapproval_timeout
v0.12.x - Hybrid Search, Reranking, Advanced Chunking & Dynamic Tools¶
- ✅ BM25: Pure-Python Okapi BM25 keyword search; configurable k1/b; stop word removal; zero dependencies
- ✅ HybridSearcher: Vector + BM25 fusion via RRF or weighted linear combination
- ✅ HybridSearchTool: Agent-ready
@toolwith source attribution and score thresholds - ✅ FusionMethod:
RRF(rank-based) andWEIGHTED(normalised score) strategies - ✅ Reranker ABC: Protocol for cross-encoder reranking with
rerank(query, results, top_k) - ✅ CohereReranker: Cohere Rerank API v2 (
rerank-v3.5default) - ✅ JinaReranker: Jina AI Rerank API (
jina-reranker-v2-base-multilingualdefault) - ✅ HybridSearcher integration: Optional
reranker=param for post-fusion re-scoring - ✅ SemanticChunker: Embedding-based topic-boundary splitting; cosine similarity threshold
- ✅ ContextualChunker: LLM-generated context prepended to each chunk (Anthropic-style contextual retrieval)
- ✅ ToolLoader: Discover
@toolfunctions from modules, files, and directories; hot-reload support - ✅ Agent dynamic tools:
add_tool,add_tools,remove_tool,replace_toolwith prompt rebuild
v0.12.0 - Response Caching¶
- ✅ InMemoryCache: Thread-safe LRU + TTL cache with
OrderedDict; zero dependencies - ✅ RedisCache: Distributed TTL cache for multi-process deployments (optional
redisdep) - ✅ CacheKeyBuilder: Deterministic SHA-256 keys from (model, prompt, messages, tools, temperature)
- ✅ Agent Integration:
AgentConfig(cache=...)checks cache before every provider call
v0.11.0 - Streaming & Parallel Execution¶
- ✅ E2E Streaming: Native tool streaming via
Agent.astreamwithUnion[str, ToolCall]provider protocol - ✅ Parallel Tool Execution:
asyncio.gatherfor async,ThreadPoolExecutorfor sync; enabled by default - ✅ Full Type Safety: 0 mypy errors across 80+ source and test files
v0.10.0 - Critical Architecture¶
- ✅ Native Function Calling: OpenAI, Anthropic, and Gemini native tool APIs
- ✅ Context Propagation:
contextvars.copy_context()for async tool execution - ✅ Routing Mode:
AgentConfig(routing_only=True)for classification without execution
v0.9.0 - Core Capabilities & Reliability¶
- ✅ Custom System Prompt:
AgentConfig(system_prompt=...)for domain instructions - ✅ Structured AgentResult:
run()returnsAgentResultwith tool calls, args, and iterations - ✅ Reusable Agent Instances:
Agent.reset()clears history/memory for clean reuse
v0.8.0 - Embeddings & RAG¶
- ✅ Full RAG Stack: VectorStore (Memory/SQLite/Chroma), Embeddings (OpenAI/Gemini), Document Loaders
- ✅ RAG Tools:
RAGToolandSemanticSearchToolfor knowledge base queries
v0.6.0 - High-Impact Features¶
- ✅ Observability Hooks:
on_agent_start,on_tool_endlifecycle events - ✅ Streaming Tools: Generators yield results progressively
v0.5.0 - Production Readiness¶
- ✅ Cost Tracking: Token counting and USD estimation
- ✅ Better Errors: PyTorch-style error messages with suggestions