Skip to main content

Text utilities and datasets for PyTorch

Project description

https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/master/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

This repository consists of:

Note: The legacy code discussed in torchtext v0.7.0 release note has been retired to torchtext.legacy folder. Those legacy code will not be maintained by the development team, and we plan to fully remove them in the future release. See torchtext.legacy folder for more details.

Installation

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

master

3.6+

1.8

0.9

3.6+

1.7

0.8

3.6+

1.6

0.7

3.6+

1.5

0.6

3.5+

1.4

0.5

2.7, 3.5+

0.4 and below

0.2.3

2.7, 3.5+

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017, Multi30k

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

For example, to access the raw text from the AG_NEWS dataset:

>>> from torchtext.datasets import AG_NEWS
>>> train_iter = AG_NEWS(split='train')
>>> next(train_iter)
>>> # Or iterate with for loop
>>> for (label, line) in train_iter:
>>>     print(label, line)
>>> # Or send to DataLoader
>>> from torch.utils.data import DataLoader
>>> train_iter = AG_NEWS(split='train')
>>> dataloader = DataLoader(train_iter, batch_size=8, shuffle=False)

Tutorials

To get started with torchtext, users may refer to the following tutorials available on PyTorch website.

[Prototype] Experimental Code

We have re-written several building blocks under torchtext.experimental:

  • Transforms: some basic data processing building blocks

  • Vectors: the vectors to convert tokens into tensors.

These prototype building blocks in the experimental folder are available in the nightly release only. The nightly packages are accessible via Pip and Conda for Windows, Mac, and Linux. For example, Linux users can install the nightly wheels with the following command:

pip install --pre --upgrade torch torchtext -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

For more detailed instructions, please refer to Install PyTorch. It should be noted that the new building blocks are still under development, and the APIs have not been solidified.

[BC Breaking] Legacy

In the v0.9.0 release, we moved the following legacy code to torchtext.legacy. This is part of the work to revamp the torchtext library and the motivation has been discussed in Issue #664:

  • torchtext.legacy.data.field

  • torchtext.legacy.data.batch

  • torchtext.legacy.data.example

  • torchtext.legacy.data.iterator

  • torchtext.legacy.data.pipeline

  • torchtext.legacy.datasets

We have a migration tutorial to help users switch to the torchtext datasets in v0.9.0 release. For the users who still want the legacy components, they can add legacy to the import path.

In the v0.10.0 release, we retire the Vocab class to torchtext.legacy. Users can still access the legacy Vocab via torchtext.legacy.vocab. This class has been replaced by a Vocab module that is backed by efficient C++ implementation and provides common functional APIs for NLP workflows.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.10.1-cp39-cp39-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.10.1-cp39-cp39-manylinux1_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.9

torchtext-0.10.1-cp39-cp39-macosx_10_9_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

torchtext-0.10.1-cp38-cp38-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.10.1-cp38-cp38-manylinux1_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.8

torchtext-0.10.1-cp38-cp38-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

torchtext-0.10.1-cp37-cp37m-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.7m Windows x86-64

torchtext-0.10.1-cp37-cp37m-manylinux1_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.7m

torchtext-0.10.1-cp37-cp37m-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

torchtext-0.10.1-cp36-cp36m-win_amd64.whl (1.3 MB view details)

Uploaded CPython 3.6m Windows x86-64

torchtext-0.10.1-cp36-cp36m-manylinux1_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.6m

torchtext-0.10.1-cp36-cp36m-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file torchtext-0.10.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 bc237e62488c14374cfbcca630045cfba031905fde629eed1ce29a459981e07e
MD5 6d9e749818e5b1293a05c651cb3da855
BLAKE2b-256 6b407447ac546ff9e244d363e44ec397278789990cc6a4900f59f14bb68f1383

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.1-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.6 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9d28b712f921a0407ce61af81c3dc07c48cab56f83d5f72f9a8ed3a4a200dad2
MD5 dc9be30cda826bfde7bc6dc1801ec513
BLAKE2b-256 43dac81346767d38c131d06e38497c429ec7ed3f90d2fa43e0ad7996387e3a95

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 46e14c39f52c5db4376bccbaa05ab2122489fdd152f592c1f5f891bc6c8e86e9
MD5 8b298611a8d573356a7179cdfee6470d
BLAKE2b-256 71d4a9b90dabf7980bc69be94704daa058e0676ada1f04c44ee6931dc25cf178

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 bec6a60e0420323df356366d266d0caff1ee27e86edf593274824af95b3cc488
MD5 efeb1078442db89a9691b44f357367e7
BLAKE2b-256 cb1d72b57df4487a2ec25a8b640c5309db2a7e687d3d5bc32b8f18f79fe93888

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.6 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f10ed1e2751cc56958cbda11639bd56bb3c96d0923a9ff8da0e6771591ad2ad5
MD5 52e00d87412ad3e84971a3ff035aadac
BLAKE2b-256 d24cfbbc6f7fbe71c69052dfb512efb757ecdb10920f38bc5624f9f8d4f7a931

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d67230cc28df7dd2be6fe6f62de7910bfb3dff8703197d34d1ff56d7103453e1
MD5 0c2706c5821bfc09ed806b187d8857d0
BLAKE2b-256 30ed965bbfcca68e7e363225bf9d8f0f0e9a5928ab9d9a10252572f368e89dc9

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 051a001ab760e459c361719aca181643ca9c0e5212f5ea4213da0e441520145e
MD5 d6fddd8c70eb16336acd66829d103e0e
BLAKE2b-256 4f0b2c8a85bf9f718e5c7c5be39ea47863e6f0c2ed6814702aae0ffa1fb1b7ff

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.6 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 17a1e84b8fc33ba92b02635cc2471f98c27290dbaa0e43e5d442bc3d172285a6
MD5 4c63ad0d2e364ece119a4d9451174b59
BLAKE2b-256 1113b91d7cd7b9646690f7a59c91c0ad02e0807e13da9ee4fd43833dcd3c9db0

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.1-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 85d06b4e71fd65695e391b8d614c0c0c12e71f1dc45380fece85036129bd0b80
MD5 f1c3cdd6ab9dd6863c17b6a1fbbbffc3
BLAKE2b-256 1f5a11acd40ce685674b35ecefd29c655d3cb8ea4672bd5156b13a54128de574

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 52d4d22a744adfa344d90c4ebc77d0ac38a4ebdf09fa2ecfa64f339212e68ca2
MD5 d45019e41548eca856b139d82e9a5eb5
BLAKE2b-256 961abab5ac8e0e22f5a34b81363191dfb7164f3fab12daecebd2a1e1b08d7e81

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.6 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 0e0d658143761aaf386bf7e9b6e209e01beea5e96ffe3698ea81228b31d2283e
MD5 5d5a0b33ac6427036a60f23ab680706d
BLAKE2b-256 491bee32254f36744483e99bd7c25ea6ce4c1c54340f7d61bbd268584d740818

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.1-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.1-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.1

File hashes

Hashes for torchtext-0.10.1-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6f4c0bb24f5f06cb5d636548ac99b4d2aba3449f9f8a0c0f53b8712dbd9b8618
MD5 3115df9279c486d8a736243cec4ed4c4
BLAKE2b-256 65a90a9bdef16ae04dc695e3d364bddc04eb4d41b813e07a91f3a01b22990206

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page