Docs/SDK/Async API

Async API (Python SDK)

NEW in v0.90.17: Complete async API implementation with async versions of all SDK functions including core utilities, supporting utilities, agent layer, and developer tools. All async functions are now organized in their respective modules, with async_api serving as a convenient re-export point.

Why Use Async API?

The async API enables you to build high-performance automation with Python's asyncio framework:

Concurrent operations: Run multiple browser tasks in parallel
Better performance: Non-blocking I/O for faster automation
Framework integration: Works seamlessly with FastAPI, aiohttp, asyncio
Modern Python: Leverage async/await syntax
No breaking changes: Coexists with sync API - use what fits your needs

AsyncPredicateBrowser

The AsyncPredicateBrowser class provides async context manager support and all the features of the sync PredicateBrowser.

Basic Usage

from predicate.async_api import AsyncPredicateBrowser

# Async context manager (recommended)
async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        # Browser automatically closes when done

# Manual lifecycle
async def manual():
    browser = AsyncPredicateBrowser()
    await browser.start()
    await browser.goto("https://example.com")
    await browser.close()

With API Key

from predicate.async_api import AsyncPredicateBrowser

async def main():
    async with AsyncPredicateBrowser(api_key="sk_...") as browser:
        await browser.goto("https://example.com")
        # Use Pro/Enterprise features

Custom Viewport

from predicate.async_api import AsyncPredicateBrowser

async def main():
    # Custom viewport size
    async with AsyncPredicateBrowser(
        viewport={"width": 1920, "height": 1080}
    ) as browser:
        await browser.goto("https://example.com")

From Existing Playwright Context

from predicate.async_api import AsyncPredicateBrowser
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        # Create Playwright context
        context = await p.chromium.launch_persistent_context(
            "./user_data",
            headless=False
        )

        # Convert to AsyncPredicateBrowser
        browser = AsyncPredicateBrowser.from_existing(context)

        # Use all Predicate features
        await browser.page.goto("https://example.com")

From Existing Page

from predicate.async_api import AsyncPredicateBrowser
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        context = await browser.new_context()
        page = await context.new_page()

        # Navigate first
        await page.goto("https://example.com")

        # Convert to AsyncPredicateBrowser
        sentience_browser = AsyncPredicateBrowser.from_page(page)

Async Functions

All core SDK functions have async versions with _async suffix for clarity.

snapshot_async()

Capture page snapshot asynchronously:

from predicate.async_api import AsyncPredicateBrowser, snapshot_async

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")

        # Capture snapshot
        snap = await snapshot_async(browser)

        print(f"Found {len(snap.elements)} elements")
        print(f"Page title: {snap.title}")

Parameters:

browser (AsyncPredicateBrowser): Browser instance
screenshot (bool, optional): Include screenshot. Defaults to True
limit (int, optional): Max elements to return
goal (str, optional): ML reranking goal

click_async()

Click an element asynchronously:

from predicate.async_api import AsyncPredicateBrowser, snapshot_async, click_async, find

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")

        # Find and click
        snap = await snapshot_async(browser)
        button = find(snap, "role=button text~'Submit'")

        if button:
            await click_async(browser, button.id)

Parameters:

browser (AsyncPredicateBrowser): Browser instance
element_id (str): Element ID from snapshot

type_text_async()

Type text into an input field asynchronously:

from predicate.async_api import AsyncPredicateBrowser, snapshot_async, type_text_async, find

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")

        snap = await snapshot_async(browser)
        email_input = find(snap, "role=textbox text~'email'")

        if email_input:
            await type_text_async(browser, email_input.id, "user@example.com")

Parameters:

browser (AsyncPredicateBrowser): Browser instance
element_id (str): Element ID from snapshot
text (str): Text to type

press_async()

Press keyboard keys asynchronously:

from predicate.async_api import AsyncPredicateBrowser, press_async

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")

        # Press Enter
        await press_async(browser, "Enter")

        # Press Escape
        await press_async(browser, "Escape")

        # Keyboard shortcut
        await press_async(browser, "Control+A")

Parameters:

browser (AsyncPredicateBrowser): Browser instance
key (str): Key name (e.g., "Enter", "Escape", "Control+A")

click_rect_async()

Click at specific coordinates asynchronously:

from predicate.async_api import AsyncPredicateBrowser, click_rect_async

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")

        # Click at coordinates
        await click_rect_async(
            browser,
            x=100,
            y=200,
            width=50,
            height=30
        )

Parameters:

browser (AsyncPredicateBrowser): Browser instance
x (int): X coordinate
y (int): Y coordinate
width (int): Click area width
height (int): Click area height

Phase 2A: Core Utilities

NEW in v0.90.17: Async versions of core utility functions for semantic waiting, screenshots, and text search.

wait_for_async()

Wait for an element to appear using semantic queries:

from predicate.async_api import AsyncPredicateBrowser, wait_for_async

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        
        # Wait for element with timeout
        result = await wait_for_async(browser, "role=button", timeout=5.0)
        
        if result:
            print(f"Element found: {result.id}")

Parameters:

browser (AsyncPredicateBrowser): Browser instance
query (str): Semantic query string
timeout (float, optional): Maximum wait time in seconds. Defaults to 10.0

screenshot_async()

Capture screenshot asynchronously in PNG or JPEG format:

from predicate.async_api import AsyncPredicateBrowser, screenshot_async

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        
        # Capture screenshot as JPEG
        data_url = await screenshot_async(browser, format="jpeg", quality=80)
        
        # Save to file
        import base64
        image_data = base64.b64decode(data_url.split(',')[1])
        with open("screenshot.jpg", "wb") as f:
            f.write(image_data)

Parameters:

browser (AsyncPredicateBrowser): Browser instance
format (str, optional): Image format - "png" or "jpeg". Defaults to "png"
quality (int, optional): JPEG quality (1-100). Only used when format is "jpeg". Defaults to 90

find_text_rect_async()

Find text on the page and return pixel coordinates:

from predicate.async_api import AsyncPredicateBrowser, find_text_rect_async

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        
        # Find text and get coordinates
        text_result = await find_text_rect_async(browser, "Sign In")
        
        if text_result:
            print(f"Text found at: x={text_result.x}, y={text_result.y}")
            print(f"Size: {text_result.width}x{text_result.height}")

Parameters:

browser (AsyncPredicateBrowser): Browser instance
text (str): Text to search for

Returns: Object with x, y, width, height properties, or None if not found

Phase 2B: Supporting Utilities

NEW in v0.90.17: Async versions of supporting functions for content reading, visual overlays, and assertions.

read_async()

Read page content in various formats:

from predicate.async_api import AsyncPredicateBrowser, read_async

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        
        # Read as markdown
        markdown = await read_async(browser, output_format="markdown")
        print(markdown)
        
        # Read as plain text
        text = await read_async(browser, output_format="text")
        
        # Read raw HTML
        html = await read_async(browser, output_format="html")

Parameters:

browser (AsyncPredicateBrowser): Browser instance
output_format (str, optional): Output format - "html", "text", or "markdown". Defaults to "text"

show_overlay_async() / clear_overlay_async()

Manage visual overlays for debugging:

from predicate.async_api import AsyncPredicateBrowser, snapshot_async, show_overlay_async, clear_overlay_async

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        
        # Take snapshot
        snap = await snapshot_async(browser)
        
        # Show overlay on specific element
        await show_overlay_async(browser, snap, target_element_id=42)
        
        # Clear overlay
        await clear_overlay_async(browser)

Parameters:

browser (AsyncPredicateBrowser): Browser instance
snapshot (Snapshot): Snapshot object from snapshot_async()
target_element_id (int): Element ID to highlight

expect_async() / ExpectationAsync

Async assertion helpers with fluent API:

from predicate.async_api import AsyncPredicateBrowser, expect_async

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        
        # Assert element is visible
        element = await expect_async(browser, "role=button").to_be_visible()
        
        # Assert element exists
        await expect_async(browser, "role=link").to_exist()
        
        # Assert element contains text
        await expect_async(browser, "role=heading").to_have_text("Welcome")
        
        # Assert query returns N elements
        await expect_async(browser, "role=link").to_have_count(5)

Available Methods:

.to_be_visible() - Assert element is visible
.to_exist() - Assert element exists
.to_have_text(text) - Assert element contains text
.to_have_count(count) - Assert query returns N elements

Phase 2C: Agent Layer

NEW in v0.90.17: Full async implementation of the agent layer for natural language automation.

PredicateAgentAsync

Async agent with observe-think-act loop:

from predicate.async_api import AsyncPredicateBrowser, PredicateAgentAsync
from predicate.llm_provider import OpenAIProvider

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        
        # Initialize LLM provider
        llm = OpenAIProvider(api_key="your_key", model="gpt-4o")
        
        # Create async agent
        agent = PredicateAgentAsync(browser, llm)
        
        # Natural language automation
        result = await agent.act("Click the login button")
        result = await agent.act("Type 'user@example.com' into the email field")
        
        # Get token usage statistics
        stats = agent.get_token_stats()
        print(f"Tokens used: {stats['total_tokens']}")

Features:

Natural language automation with LLM
Token usage tracking
Observe-think-act loop
Full async/await support

Parameters:

browser (AsyncPredicateBrowser): Browser instance
llm_provider: LLM provider instance (OpenAIProvider, etc.)

Phase 2D: Developer Tools

NEW in v0.90.17: Async versions of developer tools for recording and inspection.

RecorderAsync / record_async()

Record actions and generate traces:

from predicate.async_api import AsyncPredicateBrowser, RecorderAsync

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        
        # Record actions
        async with RecorderAsync(browser, capture_snapshots=True) as recorder:
            await recorder.record_click(element_id)
            await recorder.record_type(element_id, "text")
            
            # Save trace
            recorder.save("trace.json")

Parameters:

browser (AsyncPredicateBrowser): Browser instance
capture_snapshots (bool, optional): Whether to capture snapshots. Defaults to True

InspectorAsync / inspect_async()

Inspect elements and debug interactively:

from predicate.async_api import AsyncPredicateBrowser, InspectorAsync

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        
        # Interactive inspection
        async with InspectorAsync(browser) as inspector:
            # Hover elements to see info in console
            # Click elements to see full details
            pass

Parameters:

browser (AsyncPredicateBrowser): Browser instance

Pure Functions (No Async Needed)

These functions are pure (no I/O) and don't need async versions:

from predicate.async_api import find, query

# find() - Returns single element
button = find(snap, "role=button text~'Submit'")

# query() - Returns list of elements
links = query(snap, "role=link")

Complete Example

Here's a full example combining all async functions:

import asyncio
from predicate.async_api import (
    AsyncPredicateBrowser,
    snapshot_async,
    find,
    click_async,
    type_text_async,
    press_async
)

async def login_example():
    """Complete login automation example"""
    async with AsyncPredicateBrowser() as browser:
        # Navigate to login page
        await browser.goto("https://example.com/login")

        # Take snapshot
        snap = await snapshot_async(browser)

        # Find email input
        email_input = find(snap, "role=textbox text~'email'")
        if email_input:
            await type_text_async(browser, email_input.id, "user@example.com")

        # Find password input
        snap = await snapshot_async(browser)
        password_input = find(snap, "role=textbox text~'password'")
        if password_input:
            await type_text_async(browser, password_input.id, "mypassword")

        # Click submit button
        snap = await snapshot_async(browser)
        submit_btn = find(snap, "role=button text~'log in'")
        if submit_btn:
            await click_async(browser, submit_btn.id)

        # Wait for page load
        await asyncio.sleep(2)

        # Verify login success
        snap = await snapshot_async(browser)
        print(f"Page title after login: {snap.title}")

# Run the async function
if __name__ == "__main__":
    asyncio.run(login_example())

Complete Phase 2A-2D Example

NEW in v0.90.17: Here's a comprehensive example using all the new async features:

from predicate.async_api import (
    AsyncPredicateBrowser,
    wait_for_async,
    screenshot_async,
    find_text_rect_async,
    read_async,
    show_overlay_async,
    expect_async,
    PredicateAgentAsync
)
from predicate.llm_provider import OpenAIProvider

async def comprehensive_example():
    """Example using all Phase 2A-2D features"""
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        
        # Phase 2A: Core Utilities
        # Wait for element
        result = await wait_for_async(browser, "role=button", timeout=5.0)
        
        # Capture screenshot
        data_url = await screenshot_async(browser, format="jpeg", quality=80)
        
        # Find text on page
        text_result = await find_text_rect_async(browser, "Sign In")
        
        # Phase 2B: Supporting Utilities
        # Read page content
        markdown = await read_async(browser, output_format="markdown")
        
        # Show visual overlay
        from predicate.async_api import snapshot_async
        snap = await snapshot_async(browser)
        await show_overlay_async(browser, snap, target_element_id=42)
        
        # Assertions
        element = await expect_async(browser, "role=button").to_be_visible()
        await expect_async(browser, "role=link").to_have_count(5)
        
        # Phase 2C: Agent Layer
        llm = OpenAIProvider(api_key="your_key", model="gpt-4o")
        agent = PredicateAgentAsync(browser, llm)
        
        # Natural language automation
        result = await agent.act("Click the login button")
        result = await agent.act("Type 'user@example.com' into the email field")
        
        # Token tracking
        stats = agent.get_token_stats()
        print(f"Tokens used: {stats['total_tokens']}")

if __name__ == "__main__":
    import asyncio
    asyncio.run(comprehensive_example())

Concurrent Operations

Run multiple browser tasks in parallel:

import asyncio
from predicate.async_api import AsyncPredicateBrowser, snapshot_async

async def scrape_page(url: str):
    """Scrape a single page"""
    async with AsyncPredicateBrowser() as browser:
        await browser.goto(url)
        snap = await snapshot_async(browser)
        return {
            "url": url,
            "title": snap.title,
            "element_count": len(snap.elements)
        }

async def scrape_multiple_pages():
    """Scrape multiple pages concurrently"""
    urls = [
        "https://example.com",
        "https://example.org",
        "https://example.net"
    ]

    # Run all scrapes concurrently
    tasks = [scrape_page(url) for url in urls]
    results = await asyncio.gather(*tasks)

    for result in results:
        print(f"{result['url']}: {result['title']} ({result['element_count']} elements)")

# Run
if __name__ == "__main__":
    asyncio.run(scrape_multiple_pages())

Integration with Async Frameworks

FastAPI

from fastapi import FastAPI
from predicate.async_api import AsyncPredicateBrowser, snapshot_async, find

app = FastAPI()

@app.get("/scrape")
async def scrape_endpoint(url: str):
    """API endpoint that scrapes a URL"""
    async with AsyncPredicateBrowser() as browser:
        await browser.goto(url)
        snap = await snapshot_async(browser)

        return {
            "url": url,
            "title": snap.title,
            "element_count": len(snap.elements)
        }

aiohttp

import aiohttp
from aiohttp import web
from predicate.async_api import AsyncPredicateBrowser, snapshot_async

async def handle_scrape(request):
    """Handle scrape request"""
    url = request.query.get('url')

    async with AsyncPredicateBrowser() as browser:
        await browser.goto(url)
        snap = await snapshot_async(browser)

        return web.json_response({
            "url": url,
            "title": snap.title,
            "element_count": len(snap.elements)
        })

app = web.Application()
app.router.add_get('/scrape', handle_scrape)

if __name__ == "__main__":
    web.run_app(app)

API Organization

v0.90.17 Refactoring:

All async functions are now organized in their respective modules alongside sync versions:
- AsyncPredicateBrowser → browser.py
- snapshot_async() → snapshot.py
- click_async(), type_text_async(), press_async(), click_rect_async() → actions.py
- wait_for_async(), screenshot_async(), find_text_rect_async() → wait.py, screenshot.py, find_text_rect.py
- read_async(), show_overlay_async(), expect_async() → read.py, overlay.py, expect.py
- PredicateAgentAsync → agent.py
- RecorderAsync, InspectorAsync → recorder.py, inspector.py
async_api.py serves as a convenient re-export module - all async APIs available from a single import point
Full backward compatibility - existing imports continue to work
Better code organization - async functions co-located with sync versions

Benefits

Complete Coverage:

✅ All sync functions now have async counterparts
✅ Core utilities, supporting utilities, agent layer, and developer tools
✅ Single import point from sentience.async_api

Performance:

Run multiple browser instances concurrently
Non-blocking I/O for faster automation
Better resource utilization

Code Quality:

Modern async/await syntax
Compatible with asyncio ecosystem
Type hints and IDE support
Better code organization with async functions in their respective modules

Compatibility:

Works with FastAPI, aiohttp, asyncio
No breaking changes to sync API
Same API design as sync version
Full backward compatibility maintained

Testing:

Comprehensive test coverage (36+ async tests)
All tests passing
Production-ready
6 async examples in sdk-python/examples/

Migration from Sync API

Migrating from sync to async is straightforward:

# Sync API (before)
from predicate import PredicateBrowser, snapshot, find, click

with PredicateBrowser() as browser:
    browser.goto("https://example.com")
    snap = snapshot(browser)
    button = find(snap, "role=button")
    if button:
        click(browser, button.id)

# Async API (after)
from predicate.async_api import AsyncPredicateBrowser, snapshot_async, find, click_async

async def main():
    async with AsyncPredicateBrowser() as browser:
        await browser.goto("https://example.com")
        snap = await snapshot_async(browser)
        button = find(snap, "role=button")
        if button:
            await click_async(browser, button.id)

Key changes:

Import from sentience.async_api instead of sentience
Use AsyncPredicateBrowser instead of PredicateBrowser
Add await before I/O operations (goto, snapshot_async, click_async, etc.)
Add async keyword to function definition
Pure functions (find, query) don't need async

Next Steps

Snapshot API - Learn about snapshot capture options
Action API - Explore all available actions
Query API - Master semantic element queries
Browser Setup - Configure browser settings

Screenshot API

Examples