Docs/SDK/Structured Extraction

Structured Extraction

Turn read() output into validated JSON records (schema-first) so agents can extract data reliably instead of scraping brittle HTML.

This page covers:

Extraction should produce validated data, not “maybe JSON”.

Use extraction when you need structured records (items, prices, metadata) that downstream code can trust.

Table of Contents

  1. Concept
  2. Python: typed extraction
  3. Failure modes

Concept

The stable path to structured data is:

  1. Read the page into markdown/text (read())
  2. Extract a typed object from that content (extract(...))
  3. Validate the object with a schema so callers don’t need prompt heuristics

Python: typed extraction

from pydantic import BaseModel
from predicate import read, extract

class Item(BaseModel):
    name: str
    price: str

md = read(browser, format="markdown")["content"]

result = extract(browser, llm, "Extract item name and price", schema=Item)
if result.ok:
    print(result.data.name, result.data.price)
else:
    print("extract failed:", result.error)

Failure modes

Extraction can fail for deterministic reasons:

When extraction fails, treat it as a normal verification failure: