Skip to main content

WorkArena benchmark for BrowserGym

Project description

🤖💻 WorkArena - How Capable are Web Agents at Solving Common Knowledge Work Tasks?

[Paper][Benchmark Contents][Getting Started][BrowserGym][Citing This Work]

WorkArena is a suite of browser-based tasks tailored to gauge web agents' effectiveness in supporting routine tasks for knowledge workers. By harnessing the ubiquitous ServiceNow platform, this benchmark will be instrumental in assessing the widespread state of such automations in modern knowledge work environments.

WorkArena is included in BrowserGym, a conversational gym environment for the evaluation of web agents.

Benchmark Contents

At the moment, WorkArena includes 23,150 task instances drawn from 29 tasks that cover the main components of the ServiceNow user interface. The following videos show an agent built on GPT-4-vision interacting with every such component. As emphasized by our results, this benchmark is not solved and thus, the performance of the agent is not always on point.

Knowledge Bases

Goal: The agent must search for specific information in the company knowledge base.

The agent interacts with the user via BrowserGym's conversational interface.

https://github.com/ServiceNow/WorkArena/assets/1726818/352341ba-b501-46ac-bfa6-a6c9be1ac2b7

Forms

Goal: The agent must fill a complex form with specific values for each field.

https://github.com/ServiceNow/WorkArena/assets/1726818/e2c2b5cb-3386-4f3c-b073-c8c619e0e81b

Service Catalogs

Goal: The agent must order items with specific configurations from the company's service catalog.

https://github.com/ServiceNow/WorkArena/assets/1726818/ac64db3b-9abf-4b5f-84a7-e2d9c9cee863

Lists

Goal: The agent must filter a list according to some specifications.

In this example, the agent struggles to manipulate the UI and fails to create the filter.

https://github.com/ServiceNow/WorkArena/assets/1726818/7538b3ef-d39b-4978-b9ea-8b9e106df28e

Menus

Goal: The agent must navigate to a specific application using the main menu.

https://github.com/ServiceNow/WorkArena/assets/1726818/ca26dfaf-2358-4418-855f-80e482435e6e

Getting Started

To setup WorkArena, you will need to get your own ServiceNow instance, install our Python package, and upload some data to your instance. Follow the steps below to achieve this.

a) Create a ServiceNow Developer Instance

  1. Go to https://developer.servicenow.com/ and create an account.
  2. Click on Request an instance and select the Vancouver release (initializing the instance will take a few minutes)
  3. Once the instance is ready, you will see a popup showing its URL and credentials. You will also receive a copy by email. Based on this information, set the following environment variables:
    • SNOW_INSTANCE_URL: URL of your ServiceNow developer instance
    • SNOW_INSTANCE_UNAME: Just use "admin"
    • SNOW_INSTANCE_PWD: The password for your instance. Make sure you place the value in quotes "" since it might contain special characters.
  4. Log into your instance via a browser using the admin credentials. Close any popup that appears on the main screen (e.g., agreeing to analytics).

Warning: Feel free to look around the platform, but please make sure you revert any changes (e.g., changes to list views, pinning some menus, etc.) as these changes will be persistent and affect the benchmarking process.

b) Install WorkArena and Initialize your Instance

Run the following command to install WorkArena in the BrowswerGym environment:

pip install browsergym-workarena

Then, run this command in a terminal to upload the benchmark data to your ServiceNow instance:

workarena-install

c) Validate Your Installation

The are a lot of moving parts (authentication credentials, benchmark data, etc.) so we highly recommend that you sanity-check your installation using our provided unit tests. Do this by running (might take a few minutes):

pytest -v .

Your installation is now complete! 🎉

Citing This Work

@misc{workarena2024,
      title={WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?}, 
      author={Alexandre Drouin and Maxime Gasse and Massimo Caccia and Issam H. Laradji and Manuel Del Verme and Tom Marty and Léo Boisvert and Megh Thakkar and Quentin Cappart and David Vazquez and Nicolas Chapados and Alexandre Lacoste},
      year={2024},
      eprint={2403.07718},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_workarena-0.1.0rc1.tar.gz (3.3 MB view details)

Uploaded Source

Built Distribution

browsergym_workarena-0.1.0rc1-py3-none-any.whl (2.6 MB view details)

Uploaded Python 3

File details

Details for the file browsergym_workarena-0.1.0rc1.tar.gz.

File metadata

File hashes

Hashes for browsergym_workarena-0.1.0rc1.tar.gz
Algorithm Hash digest
SHA256 d7971d2199304c9fb1d2017496e00737bff9faf7ba9d9bf2f0399059d0163a8d
MD5 2a9573f654f30971cc4bd479f4f12cb1
BLAKE2b-256 89596744636923481b916808a69b1fb25b3c6b6bd979e3d64a56e800e9f3a0c5

See more details on using hashes here.

File details

Details for the file browsergym_workarena-0.1.0rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_workarena-0.1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 291eb9432f2101b2d13b2a1d3f329ac0121efad76fbeb94b5a8f8592b611851a
MD5 5586e4e551dda0c53ead22d94ca67a55
BLAKE2b-256 95422e2911c908d8727f0949c3e39e5c835f7f3bf08c2073306f1e68f03a2063

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page