Skip to main content

Byte-sized text games for code generation tasks on virtual environments.

Project description

BYTESIZED32

Byte-sized text games for code generation tasks on virtual environments.

Quickstart

Clone the repository:

git clone git@github.com:cognitiveailab/BYTESIZED32.git
cd BYTESIZED32

Install Dependencies:

conda create --name bytesized32 python=3.10
conda activate bytesized32
pip install -e .

API key

You will need an OpenAI API key to run the experiments. Set the environment variable OPENAI_API_KEY or AZURE_OPENAI_API_KEY to your key.

Run Generation

We run three ablation experiments, namely object, action, distractor.

Prompt Game Selection

First, we generate three csv files, one for each experiment. Those files contain information about which games to use as in-context example for the code generation.

python scripts/generate_experiment_file.py action
python scripts/generate_experiment_file.py distractor
python scripts/generate_experiment_file.py object

The above script makes use of train and test csv files found in the data/ folder. They describe what actions/distractors/objects are found in each game. In action_train.csv, distractor_train.csv, object_train.csv, 1 refers to there exists such an action/distractor/object, and empty entries refers to there does not exist such an action/distractor/object. In action_test.csv, the second column is one action (human-generated) that we will find a similar prompt game with the same action. In distractor_test.csv, 1 means this test needs a distractor and 0 means it does not need a distractor. In object_test.csv, 2 means we will find a prompt with this object for similarity, 1 means this kind of object is also needed for this test and empty entries means this object may not be required.

Running the three commands above will give you experiment_action.csv, experiment_distractor.csv, and experiment_object.csv. Since sampling is used in generate_experiment_file.py, we offer our generated csv files to reproduce the results (they can be found in data/). The left column of the output csv file is the game name of a similar game (i.e. a prompt game that shares a same action/distractor/object with the test game) and the right column is the name of an unsimilar game.

Code Generation

With the generated csv files, we can now run the code generation for each experiment.

python scripts/run_code_generation.py data/experiment_action.csv --output-folder results/run/
python scripts/run_code_generation.py data/experiment_distractor.csv --output-folder results/run/
python scripts/run_code_generation.py data/experiment_object.csv --output-folder results/run/

Each command will generate 32 games according to the experiment file. By default, the generated games along with the raw LLM prompts and responses are saved in results/{datetime}/generated_games/ folder. See run_code_generation.py --help for all additional arguments.

Perform Code Reflection

Some of the generated games may not be valid Python code. We use the following script to perform self-reflection and improve code according to technical validity.

python scripts/run_code_reflection.py --game-folder results/run/generated_games/ --revision-folder results/run/revised_games/

Run Automatic Evaluation

The provided codebase can run automatic evaluation on the generated games. The evaluation is based on the following metrics:

  • Technical Validity: whether the generated game is valid Python code and contains expected class and methods.
  • Specification Compliance: whether the generated game contains the required actions, objects, and distractors as specified in the experiment file.
  • Physical Reality Alignment: whether the generated game model the constraints of the physical world.
  • Game Winnability: whether the generated game is winnable, i.e. there exists a sequence of actions that lead to a winning state.
python scripts/run_code_evaluation.py --game-folder results/run/revised_games/ --results-file results/run/results.json

Note: The Specification Compliance evaluation depends on data/test_eval.csv which stores all the labels (i.e. actions and objects that we are interested in and that should be included in the generated game, as well as whether the generated game should contain distractors (1 means there should be a distractor, otherwise it is 0)). This file was generated manually. If you generate your own experiment file, change this file accordingly.

Visualize Results

python scripts/make_table2.py --results results/run/results.json
python scripts/make_table3.py --results results/run/results.json
python scripts/make_figure4.py --results results/run/results.json

Citing ByteSized32

Our paper was presented at EMNLP2023 and is also available on Arxiv.

If you use our codebase, please consider citing our paper:

@article{Wang2023ByteSized32AC,
  title={ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games},
  author={Ruoyao Wang and Graham Todd and Xingdi Yuan and Ziang Xiao and Marc-Alexandre C{\^o}t{\'e} and Peter Alexander Jansen},
  journal={ArXiv},
  year={2023},
  volume={abs/2305.14879},
  url={https://api.semanticscholar.org/CorpusID:258865971}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bytes32-1.1.0.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

bytes32-1.1.0-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file bytes32-1.1.0.tar.gz.

File metadata

  • Download URL: bytes32-1.1.0.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for bytes32-1.1.0.tar.gz
Algorithm Hash digest
SHA256 405cecdd06ac60e351ccf53785741022db6052d0b067e3dbd670d8546048106a
MD5 6f21fb203141b9e8f86e735264a56ad7
BLAKE2b-256 807290d188941e4cc7b00afa5c36706787db35a3f4ebd8fe7f60e57d9f9b1422

See more details on using hashes here.

File details

Details for the file bytes32-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: bytes32-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for bytes32-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 28494a6b823a3632a2ea650a2ed3dad6bc89b5281f0a1eda0a1e9e2c98eebbb6
MD5 7a1f7ccb7992ffc49a7f621a02c44327
BLAKE2b-256 66c712d228e702495274ff216905c4cfd153822ab7ba92e3b5124b64cbe4d1be

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page