Skip to main content

Text utilities and datasets for PyTorch

Project description

https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/master/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

This repository consists of:

Note: The legacy code discussed in torchtext v0.7.0 release note has been retired to torchtext.legacy folder. Those legacy code will not be maintained by the development team, and we plan to fully remove them in the future release. See torchtext.legacy folder for more details.

Installation

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

master

3.6+

1.8

0.9

3.6+

1.7

0.8

3.6+

1.6

0.7

3.6+

1.5

0.6

3.5+

1.4

0.5

2.7, 3.5+

0.4 and below

0.2.3

2.7, 3.5+

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017, Multi30k

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

For example, to access the raw text from the AG_NEWS dataset:

>>> from torchtext.datasets import AG_NEWS
>>> train_iter = AG_NEWS(split='train')
>>> next(train_iter)
>>> # Or iterate with for loop
>>> for (label, line) in train_iter:
>>>     print(label, line)
>>> # Or send to DataLoader
>>> from torch.utils.data import DataLoader
>>> train_iter = AG_NEWS(split='train')
>>> dataloader = DataLoader(train_iter, batch_size=8, shuffle=False)

Tutorials

To get started with torchtext, users may refer to the following tutorials available on PyTorch website.

[Prototype] Experimental Code

We have re-written several building blocks under torchtext.experimental:

  • Transforms: some basic data processing building blocks

  • Vectors: the vectors to convert tokens into tensors.

These prototype building blocks in the experimental folder are available in the nightly release only. The nightly packages are accessible via Pip and Conda for Windows, Mac, and Linux. For example, Linux users can install the nightly wheels with the following command:

pip install --pre --upgrade torch torchtext -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

For more detailed instructions, please refer to Install PyTorch. It should be noted that the new building blocks are still under development, and the APIs have not been solidified.

[BC Breaking] Legacy

In the v0.9.0 release, we moved the following legacy code to torchtext.legacy. This is part of the work to revamp the torchtext library and the motivation has been discussed in Issue #664:

  • torchtext.legacy.data.field

  • torchtext.legacy.data.batch

  • torchtext.legacy.data.example

  • torchtext.legacy.data.iterator

  • torchtext.legacy.data.pipeline

  • torchtext.legacy.datasets

We have a migration tutorial to help users switch to the torchtext datasets in v0.9.0 release. For the users who still want the legacy components, they can add legacy to the import path.

In the v0.10.0 release, we retire the Vocab class to torchtext.legacy. Users can still access the legacy Vocab via torchtext.legacy.vocab. This class has been replaced by a Vocab module that is backed by efficient C++ implementation and provides common functional APIs for NLP workflows.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.10.0-cp39-cp39-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.10.0-cp39-cp39-manylinux1_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.9

torchtext-0.10.0-cp39-cp39-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

torchtext-0.10.0-cp39-cp39-macosx_10_9_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

torchtext-0.10.0-cp38-cp38-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.10.0-cp38-cp38-manylinux1_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.8

torchtext-0.10.0-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

torchtext-0.10.0-cp38-cp38-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

torchtext-0.10.0-cp37-cp37m-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.7m Windows x86-64

torchtext-0.10.0-cp37-cp37m-manylinux1_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.7m

torchtext-0.10.0-cp37-cp37m-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

torchtext-0.10.0-cp36-cp36m-win_amd64.whl (1.3 MB view details)

Uploaded CPython 3.6m Windows x86-64

torchtext-0.10.0-cp36-cp36m-manylinux1_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.6m

torchtext-0.10.0-cp36-cp36m-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file torchtext-0.10.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 0d7ec960d73e54a1960b71f53e17342795696fca794769e959dcf89f319cbc90
MD5 ef208d737fc4a17c139b0e6307472907
BLAKE2b-256 1d81ed29d9e4b70b89dc472238187a822c242ec0c3834d7af34abf36467ee091

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.6 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2697ec1af45165a915e6be1497942e0fca5029a77d054b2d6e3848170200a3a7
MD5 4eee46927797b0e61d0710865e51af94
BLAKE2b-256 297026262cccec2734ebf7dffd3972a9d13223b00fdfceeac5406dc941aaefac

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp39-cp39-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3ae94b5a12e4b5613f4498b25043fa88d0be7667298cdf99cd3e525104469578
MD5 1e41bedc94ef357bbfee4a4e2fa574ce
BLAKE2b-256 b26faec849207132e379aaaa6de5e5d1e7ba041be85214886cd852195a65fdf6

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 89af06302db8567c483888f37e1eea190dce798694b32fab400ba5f9b8a4f2de
MD5 21de886bcf8c81111539fc2df3dfecde
BLAKE2b-256 3e4c09bab70dafa70ed8c4753a0ec8bcaaa759e8827e2fcb385a44654382ca9b

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 34022c1e313ee03a9207889fa56e60a610596f0f4128ad2201b9a9d5e28d6c9a
MD5 4a5790244d5cbd45bf65c266e7b44b14
BLAKE2b-256 1d1e3741d8f70ff0ff227f0b05ae415b7c5811fff188208bcb896611b5e8dc0c

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.6 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ed6ab4f26efe90b799e191be5204614eb9c03daed36cdedd8ec83a3ab361bf80
MD5 c5b15d6e5ace60a1fc1353cef204b124
BLAKE2b-256 1762fbbf360a1b23a5b9706be5df514d492c74023d085bf20351d4a2d457a7be

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp38-cp38-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.8, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2f5dc58b9cb260e512388096699e32e47eeb25e31cc80d07dfec58dc3d2edfaa
MD5 4e0a1fcdb71f2e18dabf5ed15a5457e4
BLAKE2b-256 3f06e900515c5dd65c9658943a248a0871cd08ac06e49e80a510f7cfebc329cd

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e83f01739947661d4f5ef1bf1f120a1a3691387aaa404725bb5c8bf935e424c5
MD5 3dc07f083ed745090c546fe4ef263ea5
BLAKE2b-256 e043c2a5267ca30d67a16f04b893c5911a7985449068b0002fd92f7d49913687

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 5e909b51d981a2d813f6ecad7b283511e980dcafd5bb414544639b2ebedf5370
MD5 9baf1a9aa7775497bfb52633d5f1e0ea
BLAKE2b-256 f161d4900d775e119fab8b355fd7287d42964dc76c203f131a85aa0b054e5e6b

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.6 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c906e691e0f98cdbbb902d9fc2ed337d56b48a1e0bdb13d0f6d8ec25b5b9a19e
MD5 ac7bb92616f11ec07c9d99ff2842cb3b
BLAKE2b-256 1f068a358998644d9470bf5cc8934c253de4d6c3c9156fd1a97e77f4f62a8199

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e0d4436f6e28197ce98d77a7acb3d2d6cc0ce71da010a20baf4b60090fb09e9e
MD5 640fd357853dbc4488c6774996e64e16
BLAKE2b-256 968986dac179a31cddce03c6790e35c44b100516d32704800bea44b89e05623e

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 17ac222daf2ba9d1a7b582aa46c0b6d9e1637337101af54ce929bab2c6499b7a
MD5 a3583d73f0536b98af5b93394a63a714
BLAKE2b-256 758ee34f0bc717d8fdf524461957016762e2a24609d9f3544f2d2ebc7e47c678

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.6 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 cc155527ca5d9cb5b003af08cfa414a03c1ac46a7826cca1710aa3bf93ec594b
MD5 9fc1f75f5e14a8f09fc657833f85ceb7
BLAKE2b-256 c27827a1383848d539386e6fa09ffc5d49a45718f76d160dcd9ed4ca9c701e9b

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.10.0-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.10.0-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.10.0-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 b9a32d09b873b61dcee4c0db31aab0a2cbd2f3c5536a3b222d21397b56e4e0ed
MD5 1bd9f1b6411a9f91217d67346b73f6b1
BLAKE2b-256 6b75b8167e08063943ee8b162fc9bee5904ebf974e0959621e36c511b55a56e9

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page