Browse the web using `assistant browser` CLI commands
assistant skills install vellum-browser-useUse this skill to browse the web. All browser operations are executed through the assistant browser CLI, invoked via bash or host_bash. Each operation is a subcommand:
| Command | Description |
|---|---|
assistant browser navigate | Navigate to a URL |
assistant browser snapshot | List interactive elements on the current page |
assistant browser screenshot | Take a visual screenshot |
assistant browser click | Click an element |
assistant browser type | Type text into an input |
assistant browser press-key | Press a keyboard key |
assistant browser scroll | Scroll the page or a specific element |
assistant browser select-option | Select an option from a native <select> element |
assistant browser hover | Hover over an element to reveal menus/tooltips |
assistant browser wait-for | Wait for a condition |
assistant browser extract | Extract page text content |
assistant browser wait-for-download | Wait for a file download to complete |
assistant browser fill-credential | Fill a stored credential into a form field |
assistant browser attach | Attach the Chrome debugger to the active tab |
assistant browser detach | Detach the Chrome debugger from the active tab |
assistant browser close | Close the browser page |
assistant browser status | Diagnose browser backend readiness and setup steps |
Before using any browser commands, run assistant browser --json status first to check which browser backends are available. The status command returns JSON with readiness information for each backend mode:
assistant browser --json status
The response includes:
recommendedMode — the best available backend (use this)modes[] — per-mode status with available, summary, and userActions (remediation steps)Use --browser-mode <mode> on the assistant browser parent command to pin the browser backend:
| Value | Backend | Description |
|---|---|---|
auto | Automatic | Default. Picks the best available backend based on context. |
extension | Chrome extension | Routes through the user's Chrome browser via the extension debugger. |
cdp-inspect | CDP inspect | Connects to an already-running Chrome instance via DevTools Protocol. |
local | Playwright | Drives a dedicated Playwright-managed Chromium instance. |
assistant browser --browser-mode extension navigate --url http://www.example.com
The Chrome extension (extension mode) is the preferred browser backend. It is:
If the status check shows the extension is not available, encourage the user to install and pair it:
The status response's userActions array for the extension mode provides these same steps when the extension is not connected.
If the user declines to install the extension:
cdp-inspect — Connects to an already-running Chrome instance via DevTools Protocol (Chrome 146+). Requires enabling remote debugging in Chrome settings.local — Drives a dedicated Playwright-managed Chromium instance. Last resort — does not use the user's browser profile.Only fall back to these if the user explicitly indicates they do not want to install the extension. Prefer cdp-inspect over local.
When multiple clients support host_browser (e.g. two Chrome profiles, a macOS client and a Chrome extension), use --target-client-id <id> on the assistant browser parent command to pin all operations in the invocation to one specific client:
assistant browser --target-client-id <client-id> navigate --url https://example.com
Obtain client IDs from:
assistant clients list --capability host_browser
Omit --target-client-id when only one client is connected — the default interface-preference order (chrome-extension first, then macos) picks the best available client automatically.
Use --session <id> on the assistant browser parent command to group sequential operations so they share browser state (same page, cookies, etc.). Different session IDs create independent browser contexts.
assistant browser --session myflow navigate --url https://example.com
assistant browser --session myflow snapshot
assistant browser --session myflow click --element-id e3
Omitting --session uses the default session.
Use --json on the assistant browser parent command to get structured JSON output suitable for parsing in scripts:
assistant browser --json navigate --url https://example.com
# {"ok":true,"content":"Page title: Example Domain"}
assistant browser --json snapshot
# {"ok":true,"content":"...element list..."}
assistant browser --json screenshot
# {"ok":true,"content":"...","screenshots":[{"mediaType":"image/jpeg","data":"<base64>"}]}
Error responses use {"ok":false,"error":"..."}.
To save a screenshot to disk, use --output <path>:
assistant browser screenshot --output page.jpg
assistant browser screenshot --full-page --output full.jpg
To receive base64 screenshot data in JSON output:
assistant browser --json screenshot
The response includes a screenshots array with mediaType and data (base64) fields.
assistant browser --json status to check backend readiness — if the extension is not available, help the user install itassistant browser attach to establish the sessionassistant browser navigate --url <url> to load a pageassistant browser snapshot to discover interactive elementsclick, type, press-key, scroll, select-option, or hover to interactassistant browser extract or assistant browser screenshot --output <path> to capture resultsassistant browser detach when you are done — this releases the debugger so the user can browse freelyDate pickers / calendars: Click the date input to open the picker, re-snapshot to see calendar controls, click month navigation arrows to reach the target month, then click the target date. For <input type="date">, use type with YYYY-MM-DD format.
Native <select> elements: Use select-option with --value, --label, or --index. Do not try to click individual <option> elements.
ARIA / custom dropdowns: Click to open, take a new snapshot, then click the desired option by --element-id.
Autocomplete inputs: Type the search text, wait 500-1000ms (wait-for --duration), re-snapshot for suggestions, then click the suggestion or use press-key --key ArrowDown + press-key --key Enter.
Multi-step forms: Complete each step, wait for the next section to load, re-snapshot to discover new elements, then proceed.
Dynamic content: After interactions that change the page, use wait-for (with --selector or --text) or re-snapshot to see updated elements before continuing.
Scrolling: Use scroll --direction down to reveal below-the-fold content before snapshotting. Long pages may require multiple scrolls.
Hover menus / tooltips: Use hover to reveal hidden menus or tooltips, then re-snapshot to see newly revealed elements.
After critical actions (form submission, booking confirmation, checkout), take a screenshot and then read the saved image to visually verify results before reporting success to the user:
assistant browser screenshot --output /tmp/verify.jpg
Then read the saved image to inspect it before reporting success. Use file_read if the screenshot was taken via bash, or host_file_read if it was taken via host_bash.