TextWorldExpress: a highly optimized reimplementation of three text game benchmarks focusing on instruction following, commonsense reasoning, and object identification.

Project description

TextWorldExpress

A highly optimized reimplementation of three text game benchmarks focusing on instruction following, commonsense reasoning, and object identification.

Quickstart

Before running: You will have to have Java 1.8+ installed on your system (shipped with most linux distributions).

Install with pip:

conda create --name textworld-express python=3.8
conda activate textworld-express
pip install textworld-express

Run an example random agent, on the Coin Collector game...:

python examples/random_agent.py --game-name=coin

Run a user console where you can interact with the environment, on CookingWorld...:

python examples/human.py --game-name=cookingworld

Web Server Demo

A web server demo is also available, that allows running a TextWorldExpress user console that can be interacted with in a web browser.

To run the web server demo:

conda create --name textworld-express python=3.8
conda activate textworld-express
pip install textworld-express[webserver]

Run the web server:

python examples/textworldexpress-web-server.py

Point your web browser to localhost:8080.

TextWorldExpress Design

TextWorldExpress is written in Scala (2.12.9), and compiles using sbt into a JAR file that is run with java. For convenience, a python API is provided, which interfaces using the py4j package.

Ports: TextWorldExpress is nominally run as a server, which interfaces to the Python API with py4j through a port. The default port is 25335. The actual port used will be 25335 + the thread number provided when a TextWorldExpress class is instantiated.

Threads: TextWorldExpress is designed to run many threads simultaneously, if desired. To do so, initialize one TextWorldExpressEnv object per thread, and provide a unique threadNum for each thread. Don't forget to close down servers that you instantiate using the env.shutdown() command. If you are spawning many threads (10+) at the same time, you may wish to add a short delay (5-10 seconds) after initialization to wait for all the servers to initialize. If you are using large number of threads, or older hardware, you may need to increase this delay further.

Environments

TextWorldExpress includes high-speed versions of three popular benchmark environments for text-game research.

CookingWorld ("cookingworld")

The CookingWorld environment tasks agents with preparing meals by following the instructions in a recipe that is provided in the environment. Agents must first collect required food ingredients (e.g., milk, bell pepper, flour, salt) that can be found in the environment in canonical locations (e.g., kitchen, pantry, supermarket, garden) and containers (e.g., fridge, cupboard). Randomly generated recipes require agents to first use a knife to prepare food by slicing, dicing, or chopping a subset of ingredients, then additionally using an appropriate heating appliance to fry, roast, or barbeque the ingredients. If all ingredients are prepared according to the recipe, the agent can use an action to prepare the meal, and finally eat the meal to complete the task successfully. Task complexity can be controlled by varying the number of locations in the environment, the number of ingredients required for the recipe, and the number of distractor ingredients randomly placed in the environment that are not required for the recipe. The recipes and environments are parametrically generated, with subsets of ingredients and specific preparations held out between training, development, and test sets to prevent overfitting. CookingWorld was originally created for the First TextWorld Problems competition and later named by [Madotto etal., 2020].

TextWorld Commonsense ("twc")

Text game agents frequently learn the dynamics of environment -- such as the need to open a door before one can move through it -- from interacting with the environment itself, rather than using a pre-existing knowledge base of common sense facts or object affordances that would speed task learning. TextWorld Commonsense [Murugesan etal., 2021] aims to evaluate agents on common sense knowledge that can not be directly learned from the environment by providing agents a clean-up task where the agent must place common household objects (e.g., a dirty dish) in their canonical locations (e.g., the dishwasher) that can be found in knowledge bases such as ConceptNet. Separate lists of objects are used in the training, development, and test sets, meaning the agent can not learn object locations from the training set alone, and must rely on an external common sense knowledge base to perform well on the development and test sets. TextWorld Commonsense benchmark has three task difficulty levels, with the easiest including a single location and object to put away, while the hard setting includes up to p to 11 locations and any number of task-relevant and distractor objects.

Coin Collector ("coin")

Agents frequently find tasks such as object search, environment navigation, or pick-and-place tasks challenging. The Coin Collector distills these into a single benchmark where an agent must explore a series of rooms to locate and pick up a single coin. In the original implementation, the game map typically takes the form of a connected loop or chain, such that continually moving to new locations means the agent will eventually discover the coin -- while including medium and hard modes that add in one or more "dead-end" paths. To control for environment difficulty across games, the TextWorldExpress reimplementation uses the same map generator across environments, and generates arbitrary home environments rather than connected loops. The user maintains control of other measures of difficulty, including the total number of rooms, and the number of distractor objects placed in the environment.

Usage

Typical Usage, and Valid Action Generation

Typical usage involves first initializing a game generator, then repeatedly generating and stepping through games. Examples are provided in the /examples/ folder, with an example agent that chooses a random action at each step described below:

from textworld_express import TextWorldExpressEnv

# Initialize game generator
env = TextWorldExpressEnv(args['jar_path'], envStepLimit=args['max_steps'], threadNum=0)

# Set the game generator to generate a particular game (cookingworld, twc, or coin)
env.load(gameName = "twc", gameFold = "train", gameSeed = 0, gameParams = "numLocations=5,includeDoors=1", generateGoldPath=True)

# Then, randomly generate and play 10 games within the defined parameters
for i in range(0, 10):
  # First step
  obs = env.resetWithRandomSeed(gameFold = "train", generateGoldPath=True)
  
  for stepNum in range(0, 50):
    # Display the observations from the environment (stored as a dictionary)
    print(obs)
    
    # Select a random valid action
    validActions = obs['validActions']
    randomAction = random.choice(validActions)
    
    # Take that action
    obs = env.step(randomAction)

Setting Game Parameters

Environments initialize with default parameters. To change the parameters, supply a comma-delimited string into gameParams when calling env.load(). An example of a valid parameter configuration string for CookingWorld might be numLocations=5, numIngredients=3, numDistractorItems=0, includeDoors=0, limitInventorySize=0. Valid parameters are different for each environment, and include:

CookingWorld:

Parameter	Description	Valid range
numLocations	The number of locations in the environment	1-11
numIngredients	The number of ingredients to use in generating the random recipe	1-5
numDistractorItems	The number of distractor ingredients (not used in the recipe) in the environment	0-10
includeDoors	Whether rooms have doors that need to be opened	0 or 1
limitInventorySize	Whether the size of the inventory is limited	0 or 1

TextWorld Common Sense:

Parameter	Description	Valid range
numLocations	The number of locations in the environment	1-3
numItemsToPutAway	The number of items to put away	1-10
includeDoors	Whether rooms have doors that need to be opened	0 or 1
limitInventorySize	Whether the size of the inventory is limited	0 or 1

Coin Collector:

Parameter	Description	Valid range
numLocations	The number of locations in the environment	1-11
numDistractorItems	The number of distractor (i.e. non-coin) items in the environment	0-10
includeDoors	Whether rooms have doors that need to be opened	0 or 1
limitInventorySize	Whether the size of the inventory is limited	0 or 1

Querying current game parameters: Sometimes you may want to know what parameters the current game is generated with. These can be queried using the getGenerationProperties() method:

print("Generation properties: " + str(env.getGenerationProperties()) )

Gold paths

Gold paths can be generated by setting the generateGoldPath flag to be true when using load(), generateWithRandomSeed(), or generateWithSeed(). The path itself can be retrieved using the env.getGoldActionSequence() method:

print("Gold path: " + str(env.getGoldActionSequence()))

Note that the gold paths are generated by agents that generally perform random walks in the environment, so while they lead to successful task completion, they may not be the shortest/most efficient paths. For example, this path for Text World Common Sense wanders the environment until it either sees an object to pick up (e.g. take white coat), or an appropiate container to put an object in (e.g. put white coat in wardrobe):

Gold path: ['look around', 'move west', 'take white coat', 'take brush', 'open wardrobe', 'put white coat in wardrobe', 'move east', 'move west', 'move east', 'move west', 'move east', 'move west', 'move east', 'move north', 'take eyeliner', 'take plaid blanket', 'put eyeliner in dressing table', 'open bathroom cabinet', 'put brush in bathroom cabinet', 'move south', 'move north', 'move south', 'move west', 'open chest of drawers', 'put plaid blanket in chest of drawers', 'move east']

Generating Pre-crawled Paths

One of the unique features of TextWorldExpress is that its performance is so fast, that it becomes possible to precrawl all possible actions that a hypothetical agent might take for a given game, out to some number of steps. This has several main benefits and drawbacks:

Positive: Game speed increases dramatically -- generally, as fast as traversing a tree using pointers. In Scala, the single-thread performance for traversal has been benchmarked at 4-5 million steps per second.
Negative: It takes some initial time to precrawl the paths (one-time investment; generally minutes-to-hours)
Negative: It takes time to load the paths at the start of each run (generally on the older of seconds-to-minutes)
Negative: It can take a lot of space to store precrawled paths (generally up to 1GB per game/seed, for trees of approximately 10-12 steps, depending on the game)
Negative: There is a limited step horizon -- once the agent goes past the number of steps precrawled, there is no more information to provide. For this reason it's important to precrawl only games that are small enough to solve within the number of steps that you precrawl.

Generally, if you are using less than a few dozen game variations during training, and you are using games that can be solved in under 10-12 steps, then path precrawling might be a way to gain dramatic benefits in performance.

To precrawl paths, use the path precrawling tool:

TODO  (add command line arguments to this Scala code)

Benchmarks

Python Benchmarks

For online generation mode:

python examples/random_agent_speed_test.py --game-name=cookingworld --max-steps=50 --num-episodes=10000

For precrawled path mode (note this demo uses precrawled paths provided for benchmarking in the repository):

# From the root directory of the TextWorldExpress github repository
python examples/precrawledPathReader.py

Scala Benchmarks

For online generation mode (argument should be one of cookingworld, twc, or coin):

cd textworld_express
java -Xmx4g -cp textworld-express-1.0.0.jar textworldexpress.benchmark.BenchmarkOnline coin

For precrawled path mode (single-threaded, note thishis demo uses precrawled paths provided for benchmarking in the repository.

# From the root directory of the TextWorldExpress github repository
java -Xmx4g -cp textworld_express/textworld-express-1.0.0.jar textworldexpress.benchmark.BenchmarkPrecrawledPath

For extra speed, a threaded precrawled path benchmark (where here, change 32 to the desired number of threads):

# From the root directory of the TextWorldExpress github repository
java -Xmx4g -cp textworld_express/textworld-express-1.0.0.jar textworldexpress.benchmark.BenchmarkPrecrawledPathThreaded 32

Frequently Asked Questions

Q: Why is the Python version 10x slower than the Java/Scala version? A: This partially due to the py4j binders, that allow interfacing Python to Java/Scala code through sockets. We will look for faster solutions in the future, though the Python interface to TextWorldExpress is still about 100 times faster than the original TextWorld, so it is still a big win.

Q: Will there be more benchmark games added/I want to add my own benchmark game to TextWorldExpress A: One of the main ways that TextWorldExpress gets its speed is through essentially hardcoding the games, mechanics, and (particularly) the valid action generation. To implement your own games, please use the existing games as templates, and open a github issue if you run into any issues. If you would like to recommend a new game to add to TextWorldExpress, please make a feature request in the github issues.

Q: What is the fastest TextWorldExpress can run? A: The fastest we have clocked TextWorldExpress using the random agent benchmark is 4-5 million steps/sec per thread using precrawled games and the Scala native API, with multi-threaded performance at approximately 34 million steps/sec using an AMD 3950X 16-core CPU with 32 threads. This is equivalent to about 2 billion steps per minute. 2 billion steps would take a single thread of the original TextWorld about 77 days to run.

Citation

If you use TextWorldExpress, please provide the following citation:

TODO

Project details

Release history Release notifications | RSS feed

1.0.5

Nov 8, 2024

1.0.4

Feb 25, 2023

1.0.3

Dec 2, 2022

1.0.2

Aug 9, 2022

1.0.1

Aug 5, 2022

1.0.0

Aug 4, 2022

1.0.0rc2 pre-release

Aug 3, 2022

This version

1.0.0rc1 pre-release

Aug 2, 2022

0.0.2

Jul 28, 2022

0.0.1

Jul 28, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textworld_express-1.0.0rc1.tar.gz (41.0 MB view details)

Uploaded Aug 2, 2022 Source

Built Distribution

textworld_express-1.0.0rc1-py3-none-any.whl (41.0 MB view details)

Uploaded Aug 2, 2022 Python 3

File details

Details for the file textworld_express-1.0.0rc1.tar.gz.

File metadata

Download URL: textworld_express-1.0.0rc1.tar.gz
Upload date: Aug 2, 2022
Size: 41.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for textworld_express-1.0.0rc1.tar.gz
Algorithm	Hash digest
SHA256	`c26660b73eb5a901cd7362b3e1cb62540da54b78b3f164b69b5190a08ae75c53`
MD5	`8345372a7e714bfb3e414aa66bfe6125`
BLAKE2b-256	`2c3f01b6682feb53d14a93d4dbc9181e124ad3542b389f37fe13a28fe563978d`

See more details on using hashes here.

File details

Details for the file textworld_express-1.0.0rc1-py3-none-any.whl.

File metadata

Download URL: textworld_express-1.0.0rc1-py3-none-any.whl
Upload date: Aug 2, 2022
Size: 41.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for textworld_express-1.0.0rc1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`38e9c93733415e5afafcdd27dde090e18d38290e5bc2728d03f886f9d652fbb1`
MD5	`34d42c7fbc7224365f0cf3ad5385096b`
BLAKE2b-256	`b60365b8f8cd869695207a7d95a4487977390248358ac6ed5ce7af4fe6d54eb8`