Skip to main content

Text utilities, models, transforms, and datasets for PyTorch.

Project description

docs/source/_static/img/torchtext_logo.png https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/main/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

This repository consists of:

Installation

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

main

>=3.8, <=3.11

2.2.0

0.17.0

>=3.8, <=3.11

2.1.0

0.16.0

>=3.8, <=3.11

2.0.0

0.15.0

>=3.8, <=3.11

1.13.0

0.14.0

>=3.7, <=3.10

1.12.0

0.13.0

>=3.7, <=3.10

1.11.0

0.12.0

>=3.6, <=3.9

1.10.0

0.11.0

>=3.6, <=3.9

1.9.1

0.10.1

>=3.6, <=3.9

1.9

0.10

>=3.6, <=3.9

1.8.1

0.9.1

>=3.6, <=3.9

1.8

0.9

>=3.6, <=3.9

1.7.1

0.8.1

>=3.6, <=3.9

1.7

0.8

>=3.6, <=3.8

1.6

0.7

>=3.6, <=3.8

1.5

0.6

>=3.5, <=3.8

1.4

0.5

2.7, >=3.5, <=3.8

0.4 and below

0.2.3

2.7, >=3.5, <=3.8

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Additionally, datasets in torchtext are implemented using the torchdata library. Please take a look at the installation instructions to download the latest nightlies or install from source.

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017, Multi30k

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

  • Model pre-training: CC-100

Models

The library currently consist of following pre-trained models:

Tokenizers

The transforms module currently support following scriptable tokenizers:

Tutorials

To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.17.0-cp311-cp311-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.11 Windows x86-64

torchtext-0.17.0-cp311-cp311-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11

torchtext-0.17.0-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

torchtext-0.17.0-cp311-cp311-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.11 macOS 10.13+ x86-64

torchtext-0.17.0-cp310-cp310-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.10 Windows x86-64

torchtext-0.17.0-cp310-cp310-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10

torchtext-0.17.0-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

torchtext-0.17.0-cp310-cp310-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10 macOS 10.13+ x86-64

torchtext-0.17.0-cp39-cp39-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.17.0-cp39-cp39-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9

torchtext-0.17.0-cp39-cp39-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

torchtext-0.17.0-cp39-cp39-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 macOS 10.13+ x86-64

torchtext-0.17.0-cp38-cp38-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.17.0-cp38-cp38-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8

torchtext-0.17.0-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

torchtext-0.17.0-cp38-cp38-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

File details

Details for the file torchtext-0.17.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5983bf411a03512763adf915d9f1454cc9ededc2f4d1d5f773797f194fcf7c0e
MD5 00c32443a7d223abdda696aeef21c0b8
BLAKE2b-256 5e72170b68db3713eb7f4b5e0f6c24b337edf83b98c263749ca954a2bfadb40e

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp311-cp311-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp311-cp311-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3903179338ec81d70ae2ee1bf8c210965c8ec20bac125187dc2c8cd9ff48b970
MD5 62f01bf887369a2af5ad551987e5f479
BLAKE2b-256 fa0506f31535b390f027846b1243bbbd07422192d1458bd843f8c32f1a692226

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2b41732a78298f206b92ff875c102a158254f883dd262942a3540249700bc175
MD5 257d7a77fd1a027359749d1fc90bfcc0
BLAKE2b-256 7ad1088bcf89c5ff271037daf33e23fcf35319c1908f50ddf1e051256b12f8f8

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp311-cp311-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp311-cp311-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 f23562eac9524f1dfdf2442d5de0a94063609a8a60367f1bff4ed50e9cc4949b
MD5 9c19dde131b9bcc94f764b166de243f8
BLAKE2b-256 a1cd229ee98b4e2f43a1dfb99108755d767afdae06ed25472911c126647cf737

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d76f73ae581fc2da89b7f74b54940e7db1e3253ecae77bf532337e3fd1f31578
MD5 a8f16608dd08e7419d2bdced48fbb9ce
BLAKE2b-256 4a5df38ec7c7e1d8df3314dd7bafec5df69dfcaad49b48b20f7c4e3457e45763

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 54ef4655526ed184742dcee63bf4393b962e60e5640b16de5f5644bd5b75a58e
MD5 985e75d36a081c89815e432538335834
BLAKE2b-256 3e957a64ad3bb2a6d653f4f0572f7cecab26052991feb43e2c3e50400b07e1b4

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 96b6ed0826022cf421c1a89116741ce25ab020a0bc6ba9f545f67ebe4e472fb1
MD5 eacc33da535c23aaf7eb23a6b4da01b6
BLAKE2b-256 3c438359e4172777c3df442c2865d15cdd2a72ecd1576651b566693b5b4a2afd

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp310-cp310-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp310-cp310-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 5255d3e0e057ca88506648cd1d7047efdc95560f5bda2b66b231a5ac00657893
MD5 909d69b934bd6e9ac3bd76bdfbae54cd
BLAKE2b-256 2ef0f7f779acd383fcb0cb63f4736a6343433021d6111d58d2f2190f3c7b738f

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 2e0c3e8c4383fcf4d99108bd0008d523feb09e703f2ff61098f31b50f865ad86
MD5 7c647e64cc30d98367f49247d53cd1b3
BLAKE2b-256 f4ea66ba9aa7946acc2a98011f9191993f872c53cb5044e47d00a4327a585d00

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 34a61d9860f2d1897a14d8614b600e29eea4eaaab6ca6472e0530251c35bd4fc
MD5 5923fcc643f027a6fdf276b945d86484
BLAKE2b-256 fdac2a5331e1529c0d213555e5be7170f1cb623620b6176d7c2b6cf93f4bd0b4

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6d8d1e52f339c5b01d5bedb3e811ed15ec53b6c4c348f8ad4233be4cb99392c3
MD5 c472444013553d08af4553d81a0d20b9
BLAKE2b-256 5acf8a0350561815adda450e4e8e05a3c89cd0580d7412ed7d3d92081b6c1e5d

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp39-cp39-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp39-cp39-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 93ab376be4563532d0c4552aad92b8a9138791987971314866ef1cb7699c238a
MD5 5f7ae43fc8180d33ee8383b4ae334eae
BLAKE2b-256 7df533c4011d428740199087cd3cc2262fc3b986971366fdb2c411cec30ea658

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 57ca738cd12b7d6c2d9c83f695ac881becbab0269674fe5a6ba26e150597c27b
MD5 5bc0c75f87b5e5abc4c43dd7ae09f7a5
BLAKE2b-256 9760e54fbc0681d30c4c366d44050bb9be0e7f2fbb8185065ea8c70115ac9769

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 04b2e51c829653e2bd310d9b0b280a132f2fcd827bc14dec1b6b2cd2baf578f4
MD5 a3ab326155fb80aa975eb299cfe29053
BLAKE2b-256 ecd9e0fd5f0266bee43ebdeda9ddf00bc1f0407050cf54b3db87ffd81a4148ac

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9f872a5f13ab7c5eef1171de7d0e9920635afa2e468d806c770f6cf1266aa7f2
MD5 79a8885a126cf11ce852a5db03d0453d
BLAKE2b-256 799958a386bf513aae2e1480b0e7a9e7440b98eb3bf1b6861e84b535596365ee

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.0-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.0-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 a3d737e9e8d36629b27ff7e6607ce09ff371476442d702f309047a4066f2424d
MD5 d047d9de699aaf04ca988db8acbe411b
BLAKE2b-256 391959493232af6ddd2d88c4054184a45b37ec1a3fb32061e7982982fc4f9dd2

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page