Screenshot Tool — Web Driver Plugin

Tool Schema

{
  "name": "browser_screenshot",
  "description": "Take a screenshot of the current page or a specific element. " +
                 "Returns the image as a base64-encoded PNG string. " +
                 "Use for visual verification, debugging, or sending to a vision-capable LLM.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "selector": {
        "type": "string",
        "description": "CSS selector to screenshot a specific element. Omit for a full-page screenshot."
      },
      "full_page": {
        "type": "boolean",
        "default": false,
        "description": "If true, captures the entire page including content below the fold."
      },
      "quality": {
        "type": "integer",
        "minimum": 1,
        "maximum": 100,
        "default": 80,
        "description": "JPEG quality for compressed screenshots (PNG is always lossless)."
      },
      "format": {
        "type": "string",
        "enum": ["png", "jpeg"],
        "default": "png",
        "description": "Image format. PNG is lossless; JPEG is smaller."
      }
    }
  }
}

Response Format

{
  "format":      "png",
  "width_px":    1280,
  "height_px":   720,
  "size_bytes":  142580,
  "base64":      "iVBORw0KGgoAAAANSUhEUgAABQAAAA...(truncated)..."
}

Handler Implementation

public async Task<string> HandleScreenshotAsync(
    JsonElement input, IBrowserSession session, CancellationToken ct)
{
    var selector = input.TryGetProperty("selector", out var s) ? s.GetString() : null;
    var fullPage = input.TryGetProperty("full_page", out var fp) && fp.GetBoolean();
    var format   = input.TryGetProperty("format", out var f) ? f.GetString()! : "png";

    var page = await session.GetPageAsync(ct);
    byte[] imageBytes;

    if (selector is not null)
    {
        var element = await page.QuerySelectorAsync(selector)
            ?? throw new InvalidOperationException($"Selector '{selector}' not found.");
        imageBytes = await element.ScreenshotAsync(new()
        {
            Type = format == "jpeg" ? ScreenshotType.Jpeg : ScreenshotType.Png
        });
    }
    else
    {
        imageBytes = await page.ScreenshotAsync(new()
        {
            FullPage = fullPage,
            Type     = format == "jpeg" ? ScreenshotType.Jpeg : ScreenshotType.Png
        });
    }

    var size      = imageBytes.Length;
    var base64    = Convert.ToBase64String(imageBytes);

    return JsonSerializer.Serialize(new
    {
        format,
        size_bytes = size,
        base64
    });
}

Using Screenshots with Vision LLMs

Pass the base64 screenshot to a vision-capable LLM (GPT-4o, Claude 3) for visual analysis:

// In an MCP tool handler — take a screenshot and pass it to the vision LLM
var screenshotResult = await HandleScreenshotAsync(input, session, ct);
var screenshot = JsonSerializer.Deserialize<JsonElement>(screenshotResult);
var base64     = screenshot.GetProperty("base64").GetString()!;

// Build an LLM message with the image
var messages = new List<LLMMessage>
{
    new() { Role = "user", Content = new List<LLMContentPart>
    {
        new() { Type = "text",       Text  = "What errors do you see in this screenshot?" },
        new() { Type = "image_url",  ImageUrl = new() { Url = $"data:image/png;base64,{base64}" } }
    }}
};

var analysis = await llmProvider.CompleteAsync(messages, null, options, ct);

Screenshot size and cost. Full-page screenshots of content-heavy pages can be several MB. When passing to a vision LLM, use format: "jpeg" with a quality of 60–70 to reduce the base64 size significantly. Vision LLM APIs charge per image tile — smaller images cost less.

← Fill Form Tool Next: Session Management →