Skip to main content

This is an unofficial, use-at-your-own risks port of the webarena benchmark, for use as a standalone library package.

Project description

WebArena: A Realistic Web Environment for Building Autonomous Agents

Logo
WebArena is a standalone, self-hostable web environment for building autonomous agents

Python 3.10 pre-commit Code style: black Checked with mypy bear-ified

WebsitePaper

Overview

News

  • [12/21/2023] We release the recording of trajectories performed by human annotators on ~170 tasks. Check out the resource page for more details.
  • [11/3/2023] Multiple features!
    • Uploaded newest execution trajectories
    • Added Amazon Machine Image that pre-installed all websites so that you don't have to!
    • Zeno x WebArena which allows you to analyze your agents on WebArena without pain. Check out this notebook to upload your own data to Zeno, and this page for browsing our existing results!
  • [10/24/2023] We re-examined the whole dataset and fixed the spotted annotation bugs. The current version (v0.2.0) is relatively stable and we don't expect major updates on the annotation in the future. The new results with better prompts and the comparison with human performance can be found in our paper
  • [8/4/2023] Added the instructions and the docker resources to host your own WebArena Environment. Check out this page for details.
  • [7/29/2023] Added a well commented script to walk through the environment setup.

Install

# Python 3.10+
conda create -n webarena python=3.10; conda activate webarena
pip install -r requirements.txt
playwright install
pip install -e .

# optional, dev only
pip install -e ".[dev]"
mypy --install-types --non-interactive browser_env agents evaluation_harness
pip install pre-commit
pre-commit install

Quick Walkthrough

Check out this script for a quick walkthrough on how to set up the browser environment and interact with it using the demo sites we hosted. This script is only for education purpose, to perform reproducible experiments, please check out the next section. In the nutshell, using WebArena is very similar to using OpenAI Gym. The following code snippet shows how to interact with the environment.

from browser_env import ScriptBrowserEnv, create_id_based_action
# init the environment
env = ScriptBrowserEnv(
    headless=False,
    observation_type="accessibility_tree",
    current_viewport_only=True,
    viewport_size={"width": 1280, "height": 720},
)
# prepare the environment for a configuration defined in a json file
config_file = "config_files/0.json"
obs, info = env.reset(options={"config_file": config_file})
# get the text observation (e.g., html, accessibility tree) through obs["text"]

# create a random action
id = random.randint(0, 1000)
action = create_id_based_action(f"click [id]")

# take the action
obs, _, terminated, _, info = env.step(action)

End-to-end Evaluation

  1. Setup the standalone environment. Please check out this page for details.

  2. Configurate the urls for each website.

export SHOPPING="<your_shopping_site_domain>:7770"
export SHOPPING_ADMIN="<your_e_commerce_cms_domain>:7780/admin"
export REDDIT="<your_reddit_domain>:9999"
export GITLAB="<your_gitlab_domain>:8023"
export MAP="<your_map_domain>:3000"
export WIKIPEDIA="<your_wikipedia_domain>:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing"
export HOMEPAGE="<your_homepage_domain>:4399" # this is a placeholder

You are encouraged to update the environment variables in github workflow to ensure the correctness of unit tests

  1. Generate config file for each test example
python scripts/generate_test_data.py

You will see *.json files generated in config_files folder. Each file contains the configuration for one test example.

  1. Obtain the auto-login cookies for all websites
mkdir -p ./.auth
python browser_env/auto_login.py
  1. export OPENAI_API_KEY=your_key, a valid OpenAI API key starts with sk-

  2. Launch the evaluation

python run.py \
  --instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json \ # this is the reasoning agent prompt we used in the paper
  --test_start_idx 0 \
  --test_end_idx 1 \
  --model gpt-3.5-turbo \
  --result_dir <your_result_dir>

This script will run the first example with GPT-3.5 reasoning agent. The trajectory will be saved in <your_result_dir>/0.html

Develop Your Prompt-based Agent

  1. Define the prompts. We provide two baseline agents whose correrponding prompts are listed here. Each prompt is a dictionary with the following keys:
prompt = {
  "intro": <The overall guideline which includes the task description, available action, hint and others>,
  "examples": [
    (
      example_1_observation,
      example_1_response
    ),
    (
      example_2_observation,
      example_2_response
    ),
    ...
  ],
  "template": <How to organize different information such as observation, previous action, instruction, url>,
  "meta_data": {
    "observation": <Which observation space the agent uses>,
    "action_type": <Which action space the agent uses>,
    "keywords": <The keywords used in the template, the program will later enumerate all keywords in the template to see if all of them are correctly replaced with the content>,
    "prompt_constructor": <Which prompt construtor is in used, the prompt constructor will construct the input feed to an LLM and extract the action from the generation, more details below>,
    "action_splitter": <Inside which splitter can we extract the action, used by the prompt constructor>
    }
  }
  1. Implement the prompt constructor. An example prompt constructor using Chain-of-thought/ReAct style reasoning is here. The prompt constructor is a class with the following methods:
  • construct: construct the input feed to an LLM
  • _extract_action: given the generation from an LLM, how to extract the phrase that corresponds to the action

Citation

If you use our environment or data, please cite our paper:

@article{zhou2023webarena,
  title={WebArena: A Realistic Web Environment for Building Autonomous Agents},
  author={Zhou, Shuyan and Xu, Frank F and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Bisk, Yonatan and Fried, Daniel and Alon, Uri and others},
  journal={arXiv preprint arXiv:2307.13854},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libwebarena-0.0.2.tar.gz (114.1 kB view details)

Uploaded Source

Built Distribution

libwebarena-0.0.2-py3-none-any.whl (119.8 kB view details)

Uploaded Python 3

File details

Details for the file libwebarena-0.0.2.tar.gz.

File metadata

  • Download URL: libwebarena-0.0.2.tar.gz
  • Upload date:
  • Size: 114.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for libwebarena-0.0.2.tar.gz
Algorithm Hash digest
SHA256 124623e96333cf96df21b5e9522d0fcb69a96a53ed5ccb1056a1d51f48482bc1
MD5 a1d4a90a8a15dc5f27a51b2df96000aa
BLAKE2b-256 a1bb548601e2659677cfc508f2903c15641413aecd105c48681cc98eb678ce48

See more details on using hashes here.

File details

Details for the file libwebarena-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: libwebarena-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 119.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for libwebarena-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c3daee09c492e58ec1d67b65c2fcc47dceda7a8a5536accaee3a692d189ee0c4
MD5 f39aa6104097c11682b9141e90e3c5e3
BLAKE2b-256 e186f47b9a64bbd7599eb64ffbd4fdb7a5a038c5386b0929a5b403fdd3bb4bcb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page