How to add browser automation to your AI agent without writing Selenium

The moment your AI agent needs to interact with a website — log in, extract data, fill a form, click a button — you face a choice: build browser infrastructure yourself or find the right abstraction. Most teams start by reaching for Selenium or Playwright directly. Most teams end up regretting it.

Why Selenium is the wrong starting point for agents

Selenium was designed for deterministic test automation. You write scripts that navigate to known pages, interact with known elements, and assert known outcomes. The paths are fixed. The error handling is manual. The session management is your problem.

AI agents are different. They don't follow fixed paths — they reason about what to do next based on what they see. They encounter unexpected page states. They need to handle authentication dynamically. They need to know not just what happened, but why it happened, so the model can adjust.

Wiring Selenium or raw Playwright into an agent loop produces brittle code. Every new page layout, every bot detection challenge, every session expiry becomes a failure mode you have to anticipate and handle. You end up spending more time maintaining the browser plumbing than improving the agent's actual capabilities.

What browser automation looks like in an agent context

The right abstraction for agent browser automation has a few key properties:

The infrastructure you don't want to build

If you do decide to build browser infrastructure for your agent yourself, here's what you're signing up for:

None of this is impossible. All of it takes longer than you expect. And none of it is what makes your agent valuable.

The right approach

The right approach is to treat browser actions as a primitive in your agent's tool set — not a DIY infrastructure project. Your agent calls browse(url, task). The execution layer handles everything else: spinning up the browser, managing the session, extracting structured content, returning observable results, logging everything.

result = await legs.browse(
    url="https://example.com/product/123",
    extract="product name, price, availability"
)
# Returns: {"name": "Widget Pro", "price": "$49", "available": True}

This is what a browser action looks like from the agent's perspective: a high-level intent with a structured result. The agent can reason over the result. It doesn't need to know how the browser was provisioned or how the content was extracted.

When you need more control

Sometimes you need more than just extraction — you need interaction: clicking buttons, filling forms, navigating multi-step flows. The abstraction still holds. The execution layer handles the click, the form fill, the navigation. The agent describes what it wants to accomplish. The runtime handles the mechanics.

The key principle: your agent code should read like intent, not like browser manipulation. The moment your agent code starts talking about selectors and element handles, you've leaked the wrong abstraction into the wrong place.

Agent Legs includes a browser action type that handles full page rendering, session management, structured extraction, and action logging. No Selenium. No Playwright config. One import. Get early access.

Your agent has a brain.
Give it legs.

Free for 1,000 actions/month. No credit card required.

Get early access