Add An AI Agent to Your Web Page with One Script Tag

Page Agent is a free, open-source JavaScript-powered, in-page GUI agent from Alibaba that turns any website into an AI native web application.

Add one script tag or npm package to your site, and the library reads the page through the DOM structure and maps natural language instructions to actual click, input, scroll, and submit actions.

Your users can then type commands like “Click the login button” or “Fill out the form with my details” to interact with your web page.

Page Agent works with any LLM that supports OpenAI-compatible API specs and tool calls, from hosted models like GPT-5.3 and Claude Sonnet 4.6 to self-hosted Ollama instances running locally.

Visit Page Agent

Features

Runs entirely within your web page as pure JavaScript.
Manipulates the Document Object Model directly through text-based commands.
Supports custom Large Language Models from public cloud services or private deployments.
Accepts developer-registered tools via Zod schemas to extend agent capabilities with business-specific actions.
Includes a user interface with human-in-the-loop capabilities.
Exposes a function-calling interface so external agents or support bots can invoke Page Agent as an action tool.
Provides an optional Chrome extension for multiple page task execution.

See It In Action

Official Demo

Use Cases

Build an AI copilot inside your SaaS product using a few lines of code.
Automatically fill out complex forms in ERP and CRM systems.
Control web applications using voice commands and screen readers.
Execute multiple tab browser workflows via the optional extension.

How to Use It

Table Of Contents

Quick Start
NPM Installation
Connecting to a local Model via Ollama
Production Authentication with a Backend Proxy
Supported Models
Free Testing API Credentials
Adding Custom Tools
Configuring Agent Instructions
Data Masking
PageAgent.js vs. PageAgentExt
Interaction Capabilities Reference
Third-Party Agent Integration
Chrome Extension: Multi-Page Tasks

Quick Start

The fastest way to test Page Agent requires one script tag. Add this to your HTML file:

<script src="https://cdn.jsdelivr.net/npm/page-agent/dist/iife/page-agent.demo.js" crossorigin="true"></script>

The demo CDN connects to a free Qwen testing endpoint. It is for technical evaluation only. Do not input personal or sensitive data through this endpoint.

NPM Installation

npm install page-agent

import { PageAgent } from 'page-agent'
const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: 'YOUR_API_KEY',
    language: 'en-US',
})
await agent.execute('Click the login button')

Connecting to a local Model via Ollama

const pageAgent = new PageAgent({
    baseURL: 'http://localhost:11434/v1',
    apiKey: 'NA',
    model: 'qwen3:14b',
})

Start Ollama with a larger context window and cross-origin access enabled. On macOS or Linux:

OLLAMA_CONTEXT_LENGTH=64000 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_ORIGINS="*" ollama serve

On Windows (PowerShell):

$env:OLLAMA_CONTEXT_LENGTH=64000; $env:OLLAMA_HOST="0.0.0.0:11434"; $env:OLLAMA_ORIGINS="*"; ollama serve

Ollama notes: models under 10B parameters generally do not perform reliably. The model must support tool calls. A typical page consumes around 15,000 tokens, and that count grows with each step, so Ollama’s default 4,096-token context length will not work. Set context to at least 64,000.

Production Authentication with a Backend Proxy

Never commit LLM API keys to frontend code. Use a backend proxy and the customFetch option to authenticate requests via cookies or other server-side methods:

const agent = new PageAgent({
    baseURL: '/api/llm-proxy',
    apiKey: 'NA',
    model: 'gpt-5.1',
    customFetch: (url, init) => fetch(url, { ...init, credentials: 'include' }),
})

Supported Models

Page Agent works with any model that supports OpenAI-compatible API specs and tool calls. Recommended models (marked ⭐) offer the best balance of speed and tool call reliability.

Provider	Model	Notes
Qwen	qwen3.5-plus ⭐	Recommended
Qwen	qwen3.5-flash ⭐	Lightweight
Qwen	qwen3-coder-next
Qwen	qwen-3-max
Qwen	qwen-3-plus
Qwen	qwen3:14b (Ollama)	Local
OpenAI	gpt-5.4
OpenAI	gpt-5.2
OpenAI	gpt-5.1 ⭐	Recommended
OpenAI	gpt-5
OpenAI	gpt-5-mini
OpenAI	gpt-4.1
OpenAI	gpt-4.1-mini
DeepSeek	deepseek-3.2 ⭐	Recommended
Google	gemini-3-pro
Google	gemini-3-flash ⭐	Recommended
Google	gemini-2.5
Anthropic	claude-opus-4.6
Anthropic	claude-opus-4.5
Anthropic	claude-sonnet-4.5
Anthropic	claude-haiku-4.5 ⭐	Recommended
Anthropic	claude-sonnet-3.5
xAI	grok-4.1-fast
xAI	grok-4
xAI	grok-code-fast
MoonshotAI	kimi-k2.5
Z.AI	glm-5
Z.AI	glm-4.7

Models with weaker tool call implementations may return malformed responses. Page Agent auto-recovers from common format errors, and a higher temperature setting generally helps with these models.

Free Testing API Credentials

LLM_BASE_URL="https://page-ag-testing-ohftxirgbn.cn-shanghai.fcapp.run"
LLM_MODEL_NAME="qwen3.5-plus"
LLM_API_KEY="NA"

Use qwen3.5-flash as the model name for a lighter option.

Adding Custom Tools

Page Agent accepts custom tools defined with Zod schemas. Import from the zod/v4 subpath regardless of whether you use Zod 3 (>=3.25.0) or Zod 4. Zod Mini is not supported.

import { z } from 'zod/v4'
import { PageAgent, tool } from 'page-agent'
const pageAgent = new PageAgent({
    customTools: {
        add_to_cart: tool({
            description: 'Add a product to the shopping cart by its product ID.',
            inputSchema: z.object({
                productId: z.string(),
                quantity: z.number().min(1).default(1),
            }),
            execute: async function (input) {
                await fetch('/api/cart', {
                    method: 'POST',
                    body: JSON.stringify(input),
                })
                return `Added ${input.quantity}x ${input.productId} to cart.`
            },
        }),
        search_knowledge_base: tool({
            description: 'Search the internal knowledge base and return relevant articles.',
            inputSchema: z.object({
                query: z.string(),
                limit: z.number().max(10).default(3),
            }),
            execute: async function (input) {
                const res = await fetch(
                    `/api/kb?q=${encodeURIComponent(input.query)}&limit=${input.limit}`
                )
                const articles = await res.json()
                return JSON.stringify(articles)
            },
        }),
    },
})

To remove a built-in tool entirely, set its key to null:

const pageAgent = new PageAgent({
    customTools: {
        scroll: null,
        execute_javascript: null,
    },
})

Configuring Agent Instructions

Set global system instructions and per-URL page instructions through the instructions config:

const agent = new PageAgent({
    instructions: {
        system: `
You are a professional e-commerce assistant.
Guidelines:
- Always confirm before submitting orders
- Double-check prices and quantities
- Report errors immediately instead of retrying blindly
`,
        getPageInstructions: (url) => {
            if (url.includes('/checkout')) {
                return `
This is the checkout page.
- Verify shipping address before proceeding
- Check if any discounts are applied
- Confirm the total amount with the user
`
            }
            if (url.includes('/products')) {
                return `
This is the product listing page.
- Use filters to narrow down search results
- Check stock availability before adding to cart
`
            }
            return undefined
        }
    }
})

The system string applies to every task. getPageInstructions runs before each step and returns context specific to the current URL. Both are optional.

Data Masking

Use transformPageContent to scrub sensitive data from the DOM content before the LLM sees it:

const agent = new PageAgent({
    transformPageContent: async (content) => {
        // China phone number (11 digits starting with 1)
        content = content.replace(/\b(1[3-9]\d)(\d{4})(\d{4})\b/g, '$1****$3')
        // Email address
        content = content.replace(
            /\b([a-zA-Z0-9._%+-])[^@]*(@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})\b/g,
            '$1***$2'
        )
        // China ID card number (18 digits)
        content = content.replace(
            /\b(\d{6})(19|20\d{2})(0[1-9]|1[0-2])(0[1-9]|[12]\d|3[01])(\d{3}[\dXx])\b/g,
            '$1********$5'
        )
        // Bank card number (16–19 digits)
        content = content.replace(/\b(\d{4})\d{8,11}(\d{4})\b/g, '$1********$2')
        return content
    },
})

PageAgent.js vs. PageAgentExt

Feature	PageAgent.js	PageAgentExt (Chrome Extension)
Integration	Site developer adds the library	User installs the browser extension
Scope	Current page (best for SPAs)	Any web page, multi-tab
Extra capabilities	None	Open, switch, and close tabs

Interaction Capabilities Reference

Supported actions:

Action	Supported
Click	✓
Text input	✓
Select (dropdowns)	✓
Scroll (vertical and horizontal)	✓
Form submit	✓
Focus	✓
Execute JavaScript (opt-in)	✓

Not supported:

Action	Notes
Hover	Not supported
Drag and drop	Not supported
Right-click	Not supported
Keyboard shortcuts	Not supported
Position-based control	Not supported
Drawing	Not supported
Monaco / CodeMirror editors	Require JS instance access

Third-Party Agent Integration

Wrap pageAgent.execute() as a function-calling tool in an external agent or support bot:

const pageAgentTool = {
    name: "page_agent",
    description: "Execute web page operations",
    parameters: {
        type: "object",
        properties: {
            instruction: {
                type: "string",
                description: "Operation instruction"
            }
        },
        required: ["instruction"]
    },
    execute: async (params) => {
        const result = await pageAgent.execute(params.instruction)
        return { success: result.success, message: result.data }
    }
}

Chrome Extension: Multi-Page Tasks

Install the Page Agent Chrome extension from the Chrome Web Store or grab faster updates from GitHub Releases.

After installation, trigger multi-page tasks from page JavaScript via window.PAGE_AGENT_EXT:

// 1. Get the auth token from the extension side panel
// 2. Store it in trusted apps only
localStorage.setItem('PageAgentExtUserAuthToken', '<your-token-from-extension>')
// Execute a multi-page task
const result = await window.PAGE_AGENT_EXT.execute(
    'Search for "page-agent" on GitHub and open the first result',
    {
        baseURL: 'https://api.openai.com/v1',
        apiKey: 'your-api-key',
        model: 'gpt-5.2',
        onStatusChange: status => console.log('Status change:', status),
        onActivity: activity => console.log('Activity:', activity),
        onHistoryUpdate: history => console.log('History update:', history)
    }
)
// Stop the current task
window.PAGE_AGENT_EXT.stop()

Pros

Works with any LLM that supports tool calls.
The human-in-the-loop UI prevents automated mistakes.
One-line integration is genuinely that simple for testing.
Custom tools let you extend functionality deep into your business logic.

Cons

Cannot see images, canvas, or SVG content.
No hover, drag-drop, right-click, or keyboard shortcut support.
Page accessibility and semantic HTML quality directly impact accuracy.

Related Resources

Page Agent GitHub Repository: Source code, releases, and contribution guidelines for the project.
Page Agent Chrome Extension: The optional extension for multi-tab task execution.
browser-use: The open-source project that Page Agent’s DOM processing and prompt logic builds on.
Alibaba Cloud Bailian (DashScope): OpenAI-compatible API hosting Qwen models, used in Page Agent’s quick-start examples.
Ollama: Run LLMs locally to power Page Agent with self-hosted models.
Zod: TypeScript-first schema library used to define custom tool inputs in Page Agent.

FAQs

Q: Does Page Agent work on any website?
A: Page Agent works on any site where the developer has added the library. It is a client-side tool, not a browser extension that operates on third-party sites by default.

Q: Do I need to pay to use Page Agent?
A: The library itself is free and open source. A free Qwen-backed testing API ships with the demo build for evaluation purposes. For production use, you supply your own API key from any supported LLM provider.

Q: Which LLM returns the best results?
A: Based on the project’s own testing, qwen3.5-plus, qwen3.5-flash, gpt-5.1, deepseek-3.2, gemini-3-flash, and claude-haiku-4.5 are the recommended options.

Q: Can Page Agent see what is on the screen visually?
A: No. Page Agent does not use multimodal models and takes no screenshots. It reads page structure entirely through the DOM.

Q: What happens when Page Agent encounters an unsupported action like hover or drag?
A: Those actions fall outside Page Agent’s capability boundary and the agent will not attempt them. Workflows that require hover-triggered menus, drag-and-drop reordering, or right-click context menus need the target app’s HTML to expose equivalent accessible alternatives.

Q: Can I run Page Agent with a completely private, local model?
A: Yes. Ollama integration is supported and tested with qwen3:14b on an RTX 3090. Set context length to at least 64,000 tokens, allow cross-origin access via OLLAMA_ORIGINS="*", and use a tool-call-capable model of at least 10B parameters.

Add An AI Agent to Your Web Page with One Script Tag – Page Agent