Skip to main content

Text utilities and datasets for PyTorch

Project description

https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/master/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

This repository consists of:

Note: the legacy code discussed in torchtext v0.7.0 release note has been retired to torchtext.legacy folder. Those legacy code will not be maintained by the development team, and we plan to fully remove them in the future release. See torchtext.legacy folder for more details.

Installation

We recommend Anaconda as Python package management system. Please refer to pytorch.org for the detail of PyTorch installation. The following is the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

master

3.6+

1.8

0.9

3.6+

1.7

0.8

3.6+

1.6

0.7

3.6+

1.5

0.6

3.5+

1.4

0.5

2.7, 3.5+

0.4 and below

0.2.3

2.7, 3.5+

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

For example, to access the raw text from the AG_NEWS dataset:

>>> from torchtext.datasets import AG_NEWS
>>> train_iter = AG_NEWS(split='train')
>>> next(train_iter)
>>> # Or iterate with for loop
>>> for (label, line) in train_iter:
>>>     print(label, line)
>>> # Or send to DataLoader
>>> from torch.utils.data import DataLoader
>>> train_iter = AG_NEWS(split='train')
>>> dataloader = DataLoader(train_iter, batch_size=8, shuffle=False)

A tutorial for the end-to-end text classification workflow can be found in PyTorch tutorial

[Prototype] Experimental Code

We have re-written several building blocks under torchtext.experimental:

  • Transforms: some basic data processing building blocks

  • Vocabulary: a vocabulary to numericalize tokens

  • Vectors: the vectors to convert tokens into tensors.

These prototype building blocks in the experimental folder are available in the nightly release only. The nightly packages are accessible via Pip and Conda for Windows, Mac, and Linux. For example, Linux users can install the nightly wheels with the following command:

pip install --pre --upgrade torch torchtext -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

For more detailed instructions, please refer to Install PyTorch. It should be noted that the new building blocks are still under development, and the APIs have not been solidified.

[BC Breaking] Legacy

In v0.9.0 release, we move the following legacy code to torchtext.legacy. This is part of the work to revamp the torchtext library and the motivation has been discussed in Issue #664:

  • torchtext.legacy.data.field

  • torchtext.legacy.data.batch

  • torchtext.legacy.data.example

  • torchtext.legacy.data.iterator

  • torchtext.legacy.data.pipeline

  • torchtext.legacy.datasets

We have a migration tutorial to help users switch to the torchtext datasets in v0.9.0 release. For the users who still want the legacy components, they can add legacy to the import path.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.9.0-cp39-cp39-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.9.0-cp39-cp39-manylinux1_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.9

torchtext-0.9.0-cp39-cp39-macosx_10_9_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

torchtext-0.9.0-cp38-cp38-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.9.0-cp38-cp38-manylinux1_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.8

torchtext-0.9.0-cp38-cp38-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

torchtext-0.9.0-cp37-cp37m-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.7m Windows x86-64

torchtext-0.9.0-cp37-cp37m-manylinux1_x86_64.whl (7.1 MB view details)

Uploaded CPython 3.7m

torchtext-0.9.0-cp37-cp37m-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

torchtext-0.9.0-cp36-cp36m-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.6m Windows x86-64

torchtext-0.9.0-cp36-cp36m-manylinux1_x86_64.whl (7.1 MB view details)

Uploaded CPython 3.6m

torchtext-0.9.0-cp36-cp36m-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file torchtext-0.9.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for torchtext-0.9.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 2f8bd00362403481b65187a3cdd6f6d6187465851f627c73375dc10d62defce2
MD5 c9a645a5b8214b23f3802620db572dd0
BLAKE2b-256 c644bef5ea1d305f2ccc21291c5efc94244454679b5092ee4d3e79f77c52b821

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.0-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.0 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 503b203045d3ec495e0e457879a974471abb1e2c58e868b8d867eb0a21ac355d
MD5 eac2516d037fc754b557629bb6fd40ae
BLAKE2b-256 9e6cefec557b11cbd13815143b042e1462b84e5f01a241d7962af4a71ed59c76

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f209726309d11651b5af468144919be693a43e13fa97cdf03f93f66c0b3d5f95
MD5 017f6bf9ce6eb7111858fab1b2e73171
BLAKE2b-256 a096c6f7acd6b03ef3872a9c9cc93887af7c9043b1229bc6646e8821186b62df

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for torchtext-0.9.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 d5bb348e05bd65483592443b5468b83da06ee0d93f1103c0c02db50e8dc016e6
MD5 70d180bef244e0ca4c5b818407c87092
BLAKE2b-256 5159d190ffe6fac2f5c7a301bd9ffd5f69b9a1925e08dbfecd6ee1e8b816fedb

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.0 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d9a9762f72c0e16fee4b8bdce8b1c54c270d196643c592cb02a0e8e2b63aa2f7
MD5 347ab4509886f3c7b3a63b5052acbac3
BLAKE2b-256 39fce02604446ffefaefda8df9db59d738f73278054b4c6f0354d9d060c426a4

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3370742a71f8ca4fbc40e3df501ae0d0bf82355ea80308741e01783dc61bb412
MD5 6cc7ba62bff0c486a76aa5f32331b3bf
BLAKE2b-256 6f185f46a93bb1ab1c441b1ac4fa1899cc5fa62248a988617b23b5d0bc356c21

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for torchtext-0.9.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 a759243e2d5301c2bd6872ccd76bbdfaeb3f77b143c5f6723abcc15fa2eae73a
MD5 482171d627ab36e7378407317dddbeff
BLAKE2b-256 47bf03806544fd155d053e82dfe4a53ca80e1fdb83a01487d8b9ce4ac96030fe

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.1 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d1a76573b4b5b4885d0eadb3fffe40b5fa7e48832c294de7cdf144f1b5b683b2
MD5 a17d768b9f647e116a4e0428f188143a
BLAKE2b-256 365084184d6230686e230c464f0dd4ff32eada2756b4a0b9cefec68b88d1d580

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.0-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 8a21cb39081d5c66ab6a630bc6fb63621d8b4699a83a85623e470cef2a47a595
MD5 dbb72c0b9a880bacc4c6f4e542991b6b
BLAKE2b-256 84f42bf3e08078f546d46175bcd91450423cecb1855918f1855d782d1796b5e6

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for torchtext-0.9.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 dccbd699095358431fb59b15a625d0a79323e24a0aea3e8cf47d525945a4ccd9
MD5 f85feaa3bc0a70dfbbd176a57d45b7a2
BLAKE2b-256 f92e0dbd0e53e958c16c5a8d8dfe345d42041f64b9e4521d4f2c1555d62d3c70

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.1 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4b5382adfd4912869cb01ee3957f36a0e2b309925607ce3f744c31e1ac67b448
MD5 f9273b429072df9a51be4029bfa843b0
BLAKE2b-256 b459bc3e3276d34791de28d511b4c34a95d53a9869fec9f64f47915e214b5fc6

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.0-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.0-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3daa3f91d70b0f25adda3fb0306bb72a71b213baeaeadfc5ad283ff78d4f79fd
MD5 46b705d0f7cb830ff0df07a79030965d
BLAKE2b-256 c5fb1cd942ff6a605ee9c5e1de41510255f1d7a63929eb16ca5436059b7868fe

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page