POST /v1/snapshot

Refine coarse UI geometry (raw_elements) into a ranked, compact element list (elements) for agent decision making.

Overview

This is the API used by the SDK when use_api=true. The SDK/extension collects raw_elements locally, then the server produces elements with ranking + visual cues.

Note on screenshots: this endpoint does not return screenshots. If you request screenshot via the SDK, the screenshot is captured locally by the extension and merged into the SDK’s returned Snapshot object.

Timeouts (important for large pages)

Refinement can take longer on large/heavy pages. If you see client-side timeouts while calling the Gateway:

SDK default: the SDK uses a 30s Gateway timeout by default (backward compatible).
Configure in the SDK:
- Python: SnapshotOptions.gateway_timeout_s (seconds)
- TypeScript: SnapshotOptions.gatewayTimeoutMs (milliseconds)
Calling the API directly: set your HTTP client timeout accordingly (this is a client-side setting; it’s not part of the request body schema).

Example (SDK):

from predicate import snapshot, SnapshotOptions

snap = snapshot(
    browser,
    SnapshotOptions(use_api=True, gateway_timeout_s=60),
)

Prefer the SDK: If you're integrating an agent, use the SDK Quick Start to get action execution + consistent snapshots automatically.

Request Format

curl -X POST https://api.sentienceapi.com/v1/snapshot \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "viewport": { "width": 1920, "height": 1080 },
    "raw_elements": [
      {
        "id": 0,
        "tag": "button",
        "rect": { "x": 100, "y": 200, "width": 150, "height": 40 },
        "styles": { "z_index": "10", "cursor": "pointer" },
        "attributes": { "role": "button", "aria_label": "Sign in" },
        "text": "Sign in",
        "in_viewport": true,
        "is_occluded": false
      }
    ],
    "goal": "Click the Sign in button",
    "options": {
      "limit": 50,
      "filter": { "min_area": 100, "allowed_roles": ["button", "textbox"] }
    },
    "client_metrics": {
      "ready_state": "complete",
      "quiet_ms": 1200,
      "node_count": 2400
    },
    "client_diagnostics": {
      "captcha": {
        "detected": false,
        "confidence": 0.0,
        "evidence": { "text_hits": [], "selector_hits": [], "iframe_src_hits": [], "url_hits": [] }
      }
    }
  }'

Request Parameters

Parameter	Type	Required	Description
`url`	`string`	Yes	The URL of the webpage being analyzed
`viewport`	`object`	No	Viewport dimensions with width and height. If omitted, the server derives it from element bounds.
`raw_elements`	`array`	Yes	Array of raw element data from browser
`goal`	`string`	No	Optional goal for intent-aware ranking / ML reranking (when enabled).
`options`	`object`	No	Filtering and limiting options (`limit`, `filter`). `limit` defaults to 50 and is capped server-side (default cap: 100).

Raw Element Shape (SDK / Extension)

raw_elements is emitted by the Predicate extension. The server is tolerant of extra fields; the schema below shows the common fields used for processing.

{
  "id": 12,
  "tag": "input",
  "rect": { "x": 10, "y": 120, "width": 300, "height": 36 },
  "styles": { "z_index": "0", "cursor": "text" },
  "attributes": {
    "role": "textbox",
    "type_": "email",
    "aria_label": "Email",
    "href": null,
    "nearby_text": "Work email",
    "name": "Email",
    "input_type": "email",
    "value": null,
    "value_redacted": "true",
    "checked": null,
    "disabled": null,
    "aria_checked": null,
    "aria_disabled": null,
    "aria_expanded": null
  },
  "text": "",
  "in_viewport": true,
  "is_occluded": false,
  "iframe_content": [],
  "scroll_y": 0,
  "center_x": 160,
  "center_y": 138,
  "doc_y": 138,
  "group_key": "x0-w3-h0"
}

Response Format

{
  "status": "success",
  "timestamp": "2026-01-23T20:11:02.123Z",
  "url": "https://example.com",
  "viewport": { "width": 1920, "height": 1080 },
  "credits_used": 1,
  "credits_remaining": 123,
  "elements": [
    {
      "id": 0,
      "role": "button",
      "text": "Sign in",
      "importance": 85,
      "visual_cues": { "is_primary": true, "is_clickable": true },
      "bbox": { "x": 100, "y": 200, "width": 150, "height": 40 },
      "in_viewport": true,
      "is_occluded": false,
      "z_index": 10,
      "href": null,
      "nearby_text": null,
      "fused_rank_index": 0,
      "heuristic_index": 3,
      "ml_probability": 0.94,
      "ml_score": 2.1,
      "center_x": 175,
      "center_y": 220,
      "doc_y": 220,
      "group_key": "x0-w3-h0",
      "group_index": 0,
      "in_dominant_group": true,
      "name": "Sign in",
      "value": null,
      "input_type": null,
      "value_redacted": null,
      "checked": null,
      "disabled": null,
      "expanded": null,
      "aria_checked": null,
      "aria_disabled": null,
      "aria_expanded": null
    }
  ],
  "dominant_group_key": "x0-w3-h0",
  "ml_rerank": {
    "enabled": true,
    "applied": true,
    "reason": "goal_provided",
    "candidate_count": 120,
    "top_probability": 0.94,
    "min_confidence": 0.35,
    "is_high_confidence": true,
    "tags": ["onnx_v2"],
    "error": null
  },
  "diagnostics": {
    "confidence": 0.92,
    "reasons": [],
    "metrics": { "ready_state": "complete", "quiet_ms": 1200, "node_count": 2400, "interactive_count": 310, "raw_elements_count": 1200 },
    "captcha": null,
    "requires_vision": null,
    "requires_vision_reason": null
  },
  "modal_detected": false,
  "modal_grids": null
}

Response Fields

Top-Level Fields:

Field	Type	Description
`status`	`string`	Request status: "success" or "error"
`url`	`string`	The URL that was analyzed
`timestamp`	`string`	ISO timestamp of processing on the server
`viewport`	`object`	Viewport dimensions (width, height). Always present (derived if missing in request).
`credits_used`	`number`	Credits consumed by this call (currently: 1 per request).
`credits_remaining`	`number`	Remaining credits on your account at time of request.
`elements`	`array`	Ranked array of refined elements
`dominant_group_key`	`string \| null`	The most common group_key (main content group)
`ml_rerank`	`object \| null`	ML reranking metadata (best-effort).
`diagnostics`	`object \| null`	Runtime stability/debug info (confidence, reasons, metrics)
`modal_detected`	`boolean \| null`	True when a modal/overlay grid is detected (if modal detection is enabled server-side).
`modal_grids`	`array \| null`	Array of modal grid bounds/metadata (only populated when `modal_detected` is true).
`error`	`string \| null`	Error message if status is "error"

Element Fields

Each element in the elements array contains:

Core Fields:

Field	Type	Description
`id`	`number`	Unique identifier for clicking/interacting
`role`	`string`	Semantic role (button, textbox, link, heading, etc.)
`text`	`string`	Visible text content (may be empty string)
`importance`	`number`	AI importance score (0-1000, higher = more important)
`bbox`	`object`	Bounding box: x, y, width, height
`visual_cues`	`object`	Visual hints: is_primary, is_clickable, background_color_name
`in_viewport`	`boolean`	Whether element is visible in current viewport
`is_occluded`	`boolean`	Whether element is hidden by other elements
`z_index`	`number`	CSS z-index value (default: 0)

ML Reranking Fields (Optional):

Field	Type	Description
`fused_rank_index`	`number \| null`	0-based rank after sorting by `importance_fused`
`heuristic_index`	`number \| null`	0-based rank before ML reranking
`ml_probability`	`number \| null`	Confidence score from ONNX model (0.0 - 1.0)
`ml_score`	`number \| null`	Raw logit score from ONNX model (for debugging)

Ordinal / Layout Fields (Optional):

Field	Type	Description
`center_x`	`number \| null`	X coordinate of element center (viewport coords)
`center_y`	`number \| null`	Y coordinate of element center (viewport coords)
`doc_y`	`number \| null`	Y coordinate in document (center_y + scroll_y)
`group_key`	`string \| null`	Geometric bucket key for ordinal grouping
`group_index`	`number \| null`	Position within group (0-indexed, sorted by doc_y)
`in_dominant_group`	`boolean \| null`	Whether element is in the dominant group (main content area)

State-Aware Assertion Fields (Optional):

Field	Type	Description
`name`	`string \| null`	Accessible name/label for controls
`value`	`string \| null`	Current value for inputs (may be redacted for PII)
`input_type`	`string \| null`	Input type (text, email, password, etc.)
`value_redacted`	`boolean \| null`	Whether value was redacted for privacy
`checked`	`boolean \| null`	Normalized checked state for checkboxes/radios
`disabled`	`boolean \| null`	Normalized disabled state
`expanded`	`boolean \| null`	Normalized expanded state
`aria_checked`	`string \| null`	Raw ARIA checked string (tri-state)
`aria_disabled`	`string \| null`	Raw ARIA disabled string
`aria_expanded`	`string \| null`	Raw ARIA expanded string

Additional Fields (Optional):

Field	Type	Description
`href`	`string \| null`	Hyperlink URL (for link elements)
`nearby_text`	`string \| null`	Nearby static text (best-effort)
`diff_status`	`string \| null`	"ADDED", "REMOVED", "MODIFIED", "MOVED" (for diff overlay)

Use Cases

When to Use This API

Use /v1/snapshot when you need fine-grained control:

Custom automation frameworks - Integrate with non-Playwright browsers
Research and experimentation - Test ranking algorithms
Specialized pipelines - Build custom visual AI workflows
Headless environments - Where SDK dependencies aren't available

When to Use the SDK

The SDK is recommended for most use cases because it:

✅ Handles snapshot collection automatically
✅ Provides action execution APIs (click, type, wait)
✅ Includes error handling and retries
✅ Maintains session consistency across actions

Get started with the SDK →

Best Practices

1. Always include viewport dimensions

Accurate coordinates depend on consistent viewport size. Use the same dimensions for snapshots and actions.

2. Filter early to reduce costs

Use options.filter to reduce token costs and improve response times:

{
  "options": {
    "limit": 20,
    "filter": {
      "min_area": 100,
      "allowed_roles": ["button", "textbox", "link"]
    }
  }
}

3. Check occlusion before interacting

Always verify is_occluded: false before attempting to click or type:

if element.is_occluded:
    print(f"Element {element.id} is hidden by another element")
else:
    # Safe to interact
    click(browser, element.id)

4. Sort by importance score

Elements are pre-sorted by importance, but you can re-filter:

# Get only high-importance elements
important_elements = [e for e in elements if e.importance > 70]

Integration Example

Here's how the SDK uses this endpoint internally:

from predicate import PredicateBrowser, snapshot

with PredicateBrowser(api_key="sk_...") as browser:
    browser.page.goto("https://example.com")

    # This calls /v1/snapshot behind the scenes
    snap = snapshot(browser)

    # snap.elements contains the ranked results
    for element in snap.elements[:5]:  # Top 5 most important
        print(f"{element.role}: {element.text} (importance: {element.importance})")

Error Handling

If the request fails, you'll receive an error response:

{
  "status": "error",
  "error": "Invalid JSON payload: ...",
  "error_code": "bad_request",
  "credits_remaining": 123
}

Common errors:

Invalid viewport dimensions - Check width/height are positive numbers
Missing required field - Ensure url, viewport, and raw_elements are provided
Invalid element format - Verify raw_elements array structure
Rate limit exceeded - Reduce request frequency or upgrade plan

Next Steps

SDK Quick Start - Get started with the full SDK
Semantic Queries - Master element finding strategies
Action Execution - Learn how to interact with elements

API Reference

POST /v1/observe