Skip to main content

Text utilities and datasets for PyTorch

Project description

https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/master/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

This repository consists of:

Note: the legacy code discussed in torchtext v0.7.0 release note has been retired to torchtext.legacy folder. Those legacy code will not be maintained by the development team, and we plan to fully remove them in the future release. See torchtext.legacy folder for more details.

Installation

We recommend Anaconda as Python package management system. Please refer to pytorch.org for the detail of PyTorch installation. The following is the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

master

3.6+

1.8

0.9

3.6+

1.7

0.8

3.6+

1.6

0.7

3.6+

1.5

0.6

3.5+

1.4

0.5

2.7, 3.5+

0.4 and below

0.2.3

2.7, 3.5+

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

For example, to access the raw text from the AG_NEWS dataset:

>>> from torchtext.datasets import AG_NEWS
>>> train_iter = AG_NEWS(split='train')
>>> next(train_iter)
>>> # Or iterate with for loop
>>> for (label, line) in train_iter:
>>>     print(label, line)
>>> # Or send to DataLoader
>>> from torch.utils.data import DataLoader
>>> train_iter = AG_NEWS(split='train')
>>> dataloader = DataLoader(train_iter, batch_size=8, shuffle=False)

A tutorial for the end-to-end text classification workflow can be found in PyTorch tutorial

[Prototype] Experimental Code

We have re-written several building blocks under torchtext.experimental:

  • Transforms: some basic data processing building blocks

  • Vocabulary: a vocabulary to numericalize tokens

  • Vectors: the vectors to convert tokens into tensors.

These prototype building blocks in the experimental folder are available in the nightly release only. The nightly packages are accessible via Pip and Conda for Windows, Mac, and Linux. For example, Linux users can install the nightly wheels with the following command:

pip install --pre --upgrade torch torchtext -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

For more detailed instructions, please refer to Install PyTorch. It should be noted that the new building blocks are still under development, and the APIs have not been solidified.

[BC Breaking] Legacy

In v0.9.0 release, we move the following legacy code to torchtext.legacy. This is part of the work to revamp the torchtext library and the motivation has been discussed in Issue #664:

  • torchtext.legacy.data.field

  • torchtext.legacy.data.batch

  • torchtext.legacy.data.example

  • torchtext.legacy.data.iterator

  • torchtext.legacy.data.pipeline

  • torchtext.legacy.datasets

We have a migration tutorial to help users switch to the torchtext datasets in v0.9.0 release. For the users who still want the legacy components, they can add legacy to the import path.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.9.1-cp39-cp39-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.9.1-cp39-cp39-manylinux1_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.9

torchtext-0.9.1-cp39-cp39-macosx_10_9_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

torchtext-0.9.1-cp38-cp38-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.9.1-cp38-cp38-manylinux1_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.8

torchtext-0.9.1-cp38-cp38-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

torchtext-0.9.1-cp37-cp37m-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.7m Windows x86-64

torchtext-0.9.1-cp37-cp37m-manylinux1_x86_64.whl (7.1 MB view details)

Uploaded CPython 3.7m

torchtext-0.9.1-cp37-cp37m-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

torchtext-0.9.1-cp36-cp36m-win_amd64.whl (1.3 MB view details)

Uploaded CPython 3.6m Windows x86-64

torchtext-0.9.1-cp36-cp36m-manylinux1_x86_64.whl (7.1 MB view details)

Uploaded CPython 3.6m

torchtext-0.9.1-cp36-cp36m-macosx_10_9_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file torchtext-0.9.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 f6b19505d6290bef41b5272d84d3fb77e1873bdfed8a87ace8c34ef95a2a9e04
MD5 c0f5e4dadd76e187526a779fcfdf0aba
BLAKE2b-256 9386e1d0fafd69c6a99cbce5339519b9931c985fab2872174a7973159efbbef3

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.1-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.0 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 978d208bfe0ca46d3d1650b02aea3e438ce90bed1a5cc46b31fb74ef44869c32
MD5 0862ee13d6c7821ad52af44b1afba9b1
BLAKE2b-256 f8c2bb2d956558717e1b1a9e73deb2da13e882959a3d1315e908616d1a5fe893

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d0af91110a57af90300bea9622bed320e6459ceb95dafafaf6f7eec4c518ae16
MD5 c3e2a7f96be7a391c0242d03881b40ca
BLAKE2b-256 f2b29370728a14651aa78a0ef01da509b5c56dcf9ce0aabf2eaf5d6f1df7aee3

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 932dccde8ecb02981e89b1db01d1791a92f278a0ad3537be50f2b535f71e94f0
MD5 6fc8a40658ebbd80d594d2c5d56c23ff
BLAKE2b-256 5a37d2665d80b620910f458c64a58c6978adc0e46e3be964c1e1d33209ac91c7

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.0 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 1bc302402062a678116cdfc70fe5c1c2f257cc343ea175dcf24f9cac63918bd4
MD5 4a2745fc688da3ba41e79d9f9d0c92be
BLAKE2b-256 89563072a0b0bac2263407112da21eaa46da675809173867b749e1a67a8b9a66

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 192049727b12fda774f43abf4897f55ee896c8d1637443fcacf77e6fc37fe92f
MD5 507187c0af26bca093005465bcb17713
BLAKE2b-256 cdc948284225fa2c057a79ca9296f01912321cd8f40f0cd01e491742e82676dc

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 23d3b6a53378bd5cef7c77d4b4dccd4b563b0bdfa4219664618fd236a5d1d42e
MD5 d3b02566288968cad6a37ce3a258861b
BLAKE2b-256 04b3a169f80a584be7d64a189be949baa9fb009b3a9ba24f11d3c45168dd1f3b

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.1 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8b8d1c3fb8897806f4dcd718b943f33b846236401cc156ccba0213ee89f6a53c
MD5 b1f3333d607fb6814ec5f390d9a69d0a
BLAKE2b-256 edefa9c28bc62be28294a4def5da3286f3c020e0d3ff9b110b97f1f24b463508

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.1-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7e6f462e7fcefda970fd10b67c3d8e9675691a90e4642dd7ed179f51188d4680
MD5 253c4083108e26fe4633f33f419bb1e4
BLAKE2b-256 fa6880cd1c621120bab1a9fd1d1c827f1475d7a1ec417b3669e4e88bc435b384

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 2ea9a5e317a47329eebd5023994f9cb8449f39f4fa59473469affac4941736a7
MD5 8010d7dfce3165d9525dea6f76cf703f
BLAKE2b-256 e2cc286543384fef54588c7824803c296cbd0fa2338fb82292c5b9a35b1c96c8

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.1 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e244f20f7d8104faa90b1a50a7fc95c6a2a8ae66acc74fe07bc0a843487cb5f3
MD5 739feba48f0e8dc8b6b1c0feeabfb8bf
BLAKE2b-256 52a78938d232be811db0877531c497f4344361cc6d9377643407162e03fcaa57

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.9.1-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: torchtext-0.9.1-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.8

File hashes

Hashes for torchtext-0.9.1-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 88b9e9a3d19bf961b6a257bdd8b5dcb744d61f141374482f7a5427a9164b909d
MD5 9910a38c6c63b1754cc4ddd8a43c7073
BLAKE2b-256 c3cc8764b27db290c2e96db866ab0be1b1126cb3c7924c8ab4399f48aebd3672

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page