Multimodal Messages¶
Import: from selectools import ContentPart, image_message, Message Stability: beta Added in: v0.21.0
Message.content now accepts a list of ContentPart objects in addition to a plain string. This unlocks vision and other multimodal inputs across every provider that supports them: GPT-4o, Claude 3.5/3.7, Gemini, and Ollama vision models.
from selectools import Agent, OpenAIProvider, image_message
agent = Agent(provider=OpenAIProvider(model="gpt-4o"))
# Helper for the common "image + prompt" case
result = agent.run([
image_message("https://example.com/diagram.png", "What does this diagram show?")
])
print(result.content)
See Also
ContentPart Anatomy¶
from selectools import ContentPart, Message, Role
msg = Message(
role=Role.USER,
content=[
ContentPart(type="text", text="Compare these two screenshots."),
ContentPart(type="image_url", image_url="https://example.com/before.png"),
ContentPart(type="image_url", image_url="https://example.com/after.png"),
],
)
| Field | Used when |
|---|---|
type | One of "text", "image_url", "image_base64", "audio" |
text | Set when type == "text" |
image_url | Public URL for an image (most providers) |
image_base64 | Inline base64 payload for an image |
media_type | MIME type, e.g. "image/png" or "audio/wav" |
Helper: image_message¶
For the common "single image + prompt" case, use the image_message helper:
from selectools import image_message
# From a URL
msg = image_message("https://example.com/photo.jpg", "Describe what you see.")
# From a local file path (auto-encoded as base64)
msg = image_message("./screenshots/error.png", "What's the error in this UI?")
The helper detects whether the input is a URL or a local path and chooses the right ContentPart.type (image_url vs image_base64).
URL reachability
When you pass an http:// / https:// URL, the provider's backend fetches the image, not selectools. OpenAI, Anthropic Claude, and Google Gemini each download the URL server-side. Some hosts block bot User-Agents (Wikimedia Commons, many corporate CDNs) and will return 400 / 403 errors. If you hit "Unable to download the file" or "Cannot fetch content from the provided URL", download the image locally and pass a file path instead — that triggers the base64 path which is host-independent.
Provider Compatibility¶
| Provider | Format used internally |
|---|---|
| OpenAI | [{"type": "text", ...}, {"type": "image_url", "image_url": {"url": ...}}] |
| Anthropic | [{"type": "text", ...}, {"type": "image", "source": {"type": "base64", ...}}] |
| Gemini | types.Part objects with inline_data |
| Ollama | images parameter (list of base64 strings) |
You don't need to format any of this yourself — selectools handles the conversion in each provider's _format_messages().
Backward Compatibility¶
Message(role=..., content="plain text") continues to work everywhere. The list[ContentPart] path is opt-in and existing code is unaffected.
API Reference¶
| Symbol | Description |
|---|---|
ContentPart | Dataclass for a single part of a multimodal message |
Message.content | Now str \| list[ContentPart] |
image_message(image, prompt) | Convenience constructor for image + text |
text_content(message) | Extract concatenated text from a (possibly multimodal) Message |
Related Examples¶
| # | Script | Description |
|---|---|---|
| 81 | 81_multimodal_messages.py | Image input with image_message and raw ContentPart |