Agent-as-API (Serve Enhancement)¶
Stack: Python 3.9+, src-layout, pytest, starlette (optional dep), stdlib HTTP fallback Date: 2026-04-13 Status: Draft
Problem¶
Selectools ships a working selectools serve with /invoke, /stream, /health, and /schema endpoints — plus a visual builder and playground. But the serve module lacks three things every production deployment needs and every competitor (Agno's AgentOS, PraisonAI's agent-as-API) already provides:
- Per-user session isolation — all requests share a single agent instance and conversation history. User A's messages leak into User B's context.
- API-key authentication for the invoke endpoint — current auth is cookie-based builder auth only. There's no way for programmatic API clients to authenticate with a bearer token or API key.
- Configurable streaming — SSE works but there's no heartbeat keepalive, no configurable chunk batching, and no per-event typing (tool calls vs. text chunks vs. final result aren't typed in the SSE stream).
Solution¶
Enhance the existing serve/ module (not rewrite) with three targeted additions:
1. Session Manager¶
A pluggable SessionManager that maps (user_id, session_id) → isolated agent state (conversation history, entity memory, knowledge memory). Default: in-memory with TTL-based eviction. Optional: file-backed or Redis-backed (via protocol, no hard dep).
2. API-Key Auth Middleware¶
A middleware layer (works with both stdlib and Starlette) that validates Authorization: Bearer <key> headers against a configurable key store. Extracts user_id from the key for session routing. Falls back to existing cookie auth for browser-based playground/builder.
3. Typed SSE Events¶
Enrich the existing SSE stream with typed event categories and a heartbeat:
event: chunk
data: {"content": "Hello", "type": "text"}
event: tool_call
data: {"tool": "search", "args": {"q": "weather"}, "call_id": "tc_1"}
event: tool_result
data: {"tool": "search", "call_id": "tc_1", "result": "Sunny, 72°F"}
event: done
data: {"content": "The weather is sunny.", "iterations": 2, "usage": {...}}
: heartbeat
Acceptance Criteria¶
Session Isolation¶
- New
SessionManagerprotocol inserve/sessions.pywithget(user_id, session_id) -> SessionStateandput(user_id, session_id, state) -> Noneanddelete(user_id, session_id) -> None -
SessionStateholds:ConversationMemory,Optional[EntityMemory],Optional[KnowledgeMemory],created_at,last_accessed,metadata: Dict -
InMemorySessionManagerimplementation with configurablemax_sessions(default 1000) andttl_seconds(default 3600) - Session ID from request:
X-Session-IDheader orsession_idfield in JSON body; auto-generated UUID if absent - User ID extracted from auth (API key → user_id mapping) or
X-User-IDheader for trusted internal networks - Each
/invokeand/streamrequest clones the agent's base config, injects the session's memory, runs, then saves state back - Sessions are isolated — User A's history never appears in User B's context
- Stale sessions evicted on
get()check (lazy eviction), plus periodic sweep every 60s in Starlette mode
API-Key Authentication¶
-
serve/auth.pywithApiKeyAuthclass: validatesAuthorization: Bearer <key>headers - Key store:
Dict[str, KeyInfo]whereKeyInfohasuser_id,role,rate_limit,created_at - Keys loadable from:
~/.selectools/api_keys.jsonfile,SELECTOOLS_API_KEYSenv var (JSON), or passed programmatically - New CLI flag:
--api-key <key>for single-key quick-start (maps to user_id"default") - Existing cookie auth continues to work for playground/builder — API key auth is additive
- Unauthenticated requests to protected endpoints return
401with{"error": "unauthorized", "message": "..."} - Invalid key returns
403with{"error": "forbidden", "message": "..."}
Typed SSE Streaming¶
- SSE events use
event:field to categorize:chunk,tool_call,tool_result,error,done -
chunkevents include{"content": str, "type": "text"|"reasoning"} -
tool_callevents include{"tool": str, "args": dict, "call_id": str} -
tool_resultevents include{"tool": str, "call_id": str, "result": str} -
doneevent includes{"content": str, "iterations": int, "usage": dict} - Heartbeat comment (
: heartbeat\n\n) sent every 15s (configurable) to keep connection alive - Backward compatible: clients that ignore
event:field and only readdata:still work
Integration¶
-
AgentRouterconstructor gains optionalsession_managerandauthparams -
AgentServerconstructor gains optionalsession_manager,auth, andapi_keysparams -
selectools serveCLI gains--api-keyflag - Starlette app (
create_builder_app) gains session + auth support when available -
/healthendpoint enhanced: includesactive_sessionscount anduptime_seconds
General¶
- No new required dependencies —
starlette/uvicornremain optional - Stdlib HTTP server supports all features (sessions, API key auth, typed SSE)
- ≥90% test coverage on new code
- Stability marker:
@betaon all new public classes - Existing serve tests continue to pass with no modifications
Non-Goals¶
- WebSocket support — SSE is sufficient for server→client streaming. WebSocket adds bidirectional complexity not needed for agent responses.
- JWT/OAuth2 implementation — API key auth is the scope. Users needing JWT can wrap the server in a reverse proxy (nginx, Caddy) or implement a custom
AuthProvider. - Redis/database session backend — define the protocol, ship in-memory only. Redis adapter is a future add-on.
- Rate limiting — the
KeyInfodataclass will carry arate_limitfield for future use, but enforcement is not in v0.22.0 scope. - Multi-agent routing — one agent per server instance. Multi-agent orchestration happens at the
AgentGraphlevel, not the HTTP level. - Graceful shutdown / drain — stdlib HTTP server doesn't support it. Starlette/uvicorn handles this natively.
- OpenTelemetry integration — trace context propagation is a separate concern.
Technical Approach¶
New files¶
| File | Purpose |
|---|---|
serve/sessions.py | SessionManager protocol, SessionState dataclass, InMemorySessionManager |
serve/auth.py | ApiKeyAuth class, KeyInfo dataclass, key loading helpers |
Modified files¶
| File | Change |
|---|---|
serve/app.py | AgentRouter: accept session_manager + auth; handle_invoke/handle_stream: extract user/session, load state, run, save state; typed SSE events in handle_stream; heartbeat thread |
serve/app.py | AgentServer: pass session_manager + auth to router; auth check in handler |
serve/cli.py | --api-key flag; load api_keys.json; pass to server constructors |
serve/models.py | InvokeRequest gains optional session_id field; StreamEvent dataclass for typed SSE |
serve/_starlette_app.py | Wire session + auth into ASGI routes; async session eviction background task |
serve/__init__.py | Export SessionManager, InMemorySessionManager, ApiKeyAuth |
__init__.py | Top-level exports with @beta |
Session lifecycle¶
Request arrives
→ Auth middleware extracts user_id (from API key or cookie)
→ Session ID from X-Session-ID header or body, or auto-generate
→ session_manager.get(user_id, session_id) → SessionState
→ Clone agent's base ConversationMemory → inject session's history
→ Run agent (run or astream)
→ Extract updated history from agent → session_manager.put(...)
→ Return response
Key design: the agent itself is NOT cloned. Only its memory state is swapped per-request. The agent's tools, config, provider, and system prompt are shared (they're stateless). ConversationMemory is the only mutable per-user state.
Auth flow¶
Authorization: Bearer sk_sel_abc123
→ auth.validate("sk_sel_abc123") → KeyInfo(user_id="alice", role="editor")
→ user_id piped to session manager
→ role checked against RBAC (existing _has_permission)
No Authorization header + builder_session cookie present
→ Existing cookie auth path (unchanged)
No auth at all + auth is configured
→ 401 Unauthorized
No auth at all + auth is NOT configured
→ Allowed (backward compatible, no auth = open access)
Typed SSE format¶
Enhance AgentRouter.handle_stream() to yield typed events. The current implementation yields data: {"type": "chunk", ...}\n\n. The new format adds the SSE event: field:
event: chunk\ndata: {"content": "Hel"}\n\n
event: chunk\ndata: {"content": "lo!"}\n\n
event: tool_call\ndata: {"tool": "search", "args": {...}, "call_id": "tc_1"}\n\n
event: tool_result\ndata: {"tool": "search", "call_id": "tc_1", "result": "..."}\n\n
event: done\ndata: {"content": "...", "iterations": 2}\n\n
Clients using EventSource API automatically get event routing. Clients using raw fetch() that only parse data: lines continue to work — the event: line is a separate SSE field they'll simply ignore.
Dependencies¶
ConversationMemory— for session state cloning (uses.branch()and.add_many())EntityMemoryandKnowledgeMemory— optional per-session stateAgentRouterandAgentServer— existing classes being extended- Existing RBAC model (
ROLES,_has_permission) — reused for API key roles serve/models.py— existing request/response dataclasses
Risks¶
| Risk | Mitigation |
|---|---|
| Memory pressure from many concurrent sessions | InMemorySessionManager has max_sessions cap + TTL eviction. Document memory-per-session guidance. |
| Thread safety of session manager in stdlib server | InMemorySessionManager uses threading.Lock around the session dict. Each request gets its own memory clone. |
| Agent state beyond ConversationMemory (e.g., entity_memory is mutated in-place) | SessionState holds separate EntityMemory and KnowledgeMemory instances per session. Clone on first use. |
| Breaking existing API clients | All new params are optional with None defaults. No auth configured = open access (current behavior). SSE event: field is additive. |
| API key security — keys stored in plaintext | Keys hashed with SHA-256 in the key store. Raw key only needed on first load. Document that api_keys.json should have 600 permissions. |
| Heartbeat thread management | Use daemon thread (stdlib) or asyncio.create_task (Starlette). Thread dies with server. |