Octopus
Screenshot Tool
The browser_screenshot tool captures a PNG screenshot of the current page or a specific element and returns it as a base64-encoded string — suitable for sending to a vision LLM or embedding in logs.
Tool Schema
{
"name": "browser_screenshot",
"description": "Take a screenshot of the current page or a specific element. " +
"Returns the image as a base64-encoded PNG string. " +
"Use for visual verification, debugging, or sending to a vision-capable LLM.",
"inputSchema": {
"type": "object",
"properties": {
"selector": {
"type": "string",
"description": "CSS selector to screenshot a specific element. Omit for a full-page screenshot."
},
"full_page": {
"type": "boolean",
"default": false,
"description": "If true, captures the entire page including content below the fold."
},
"quality": {
"type": "integer",
"minimum": 1,
"maximum": 100,
"default": 80,
"description": "JPEG quality for compressed screenshots (PNG is always lossless)."
},
"format": {
"type": "string",
"enum": ["png", "jpeg"],
"default": "png",
"description": "Image format. PNG is lossless; JPEG is smaller."
}
}
}
}
Response Format
{
"format": "png",
"width_px": 1280,
"height_px": 720,
"size_bytes": 142580,
"base64": "iVBORw0KGgoAAAANSUhEUgAABQAAAA...(truncated)..."
}
Handler Implementation
public async Task<string> HandleScreenshotAsync(
JsonElement input, IBrowserSession session, CancellationToken ct)
{
var selector = input.TryGetProperty("selector", out var s) ? s.GetString() : null;
var fullPage = input.TryGetProperty("full_page", out var fp) && fp.GetBoolean();
var format = input.TryGetProperty("format", out var f) ? f.GetString()! : "png";
var page = await session.GetPageAsync(ct);
byte[] imageBytes;
if (selector is not null)
{
var element = await page.QuerySelectorAsync(selector)
?? throw new InvalidOperationException($"Selector '{selector}' not found.");
imageBytes = await element.ScreenshotAsync(new()
{
Type = format == "jpeg" ? ScreenshotType.Jpeg : ScreenshotType.Png
});
}
else
{
imageBytes = await page.ScreenshotAsync(new()
{
FullPage = fullPage,
Type = format == "jpeg" ? ScreenshotType.Jpeg : ScreenshotType.Png
});
}
var size = imageBytes.Length;
var base64 = Convert.ToBase64String(imageBytes);
return JsonSerializer.Serialize(new
{
format,
size_bytes = size,
base64
});
}
Using Screenshots with Vision LLMs
Pass the base64 screenshot to a vision-capable LLM (GPT-4o, Claude 3) for visual analysis:
// In an MCP tool handler — take a screenshot and pass it to the vision LLM
var screenshotResult = await HandleScreenshotAsync(input, session, ct);
var screenshot = JsonSerializer.Deserialize<JsonElement>(screenshotResult);
var base64 = screenshot.GetProperty("base64").GetString()!;
// Build an LLM message with the image
var messages = new List<LLMMessage>
{
new() { Role = "user", Content = new List<LLMContentPart>
{
new() { Type = "text", Text = "What errors do you see in this screenshot?" },
new() { Type = "image_url", ImageUrl = new() { Url = $"data:image/png;base64,{base64}" } }
}}
};
var analysis = await llmProvider.CompleteAsync(messages, null, options, ct);
Screenshot size and cost. Full-page screenshots of content-heavy pages can be several MB. When passing to a vision LLM, use
format: "jpeg" with a quality of 60–70 to reduce the base64 size significantly. Vision LLM APIs charge per image tile — smaller images cost less.