Skip to main content

Text utilities, models, transforms, and datasets for PyTorch.

Project description

docs/source/_static/img/torchtext_logo.png https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/main/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

This repository consists of:

Installation

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

main

>=3.8, <=3.11

1.14.0

0.15.0

>=3.8, <=3.11

1.13.0

0.14.0

>=3.7, <=3.10

1.12.0

0.13.0

>=3.7, <=3.10

1.11.0

0.12.0

>=3.6, <=3.9

1.10.0

0.11.0

>=3.6, <=3.9

1.9.1

0.10.1

>=3.6, <=3.9

1.9

0.10

>=3.6, <=3.9

1.8.1

0.9.1

>=3.6, <=3.9

1.8

0.9

>=3.6, <=3.9

1.7.1

0.8.1

>=3.6, <=3.9

1.7

0.8

>=3.6, <=3.8

1.6

0.7

>=3.6, <=3.8

1.5

0.6

>=3.5, <=3.8

1.4

0.5

2.7, >=3.5, <=3.8

0.4 and below

0.2.3

2.7, >=3.5, <=3.8

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Additionally, datasets in torchtext are implemented using the torchdata library. Please take a look at the installation instructions to download the latest nightlies or install from source.

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017, Multi30k

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

  • Model pre-training: CC-100

Models

The library currently consist of following pre-trained models:

Tokenizers

The transforms module currently support following scriptable tokenizers:

Tutorials

To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.16.0-cp311-cp311-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.11 Windows x86-64

torchtext-0.16.0-cp311-cp311-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11

torchtext-0.16.0-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

torchtext-0.16.0-cp311-cp311-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.11 macOS 10.13+ x86-64

torchtext-0.16.0-cp310-cp310-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.10 Windows x86-64

torchtext-0.16.0-cp310-cp310-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10

torchtext-0.16.0-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

torchtext-0.16.0-cp310-cp310-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10 macOS 10.13+ x86-64

torchtext-0.16.0-cp39-cp39-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.16.0-cp39-cp39-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9

torchtext-0.16.0-cp39-cp39-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

torchtext-0.16.0-cp39-cp39-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 macOS 10.13+ x86-64

torchtext-0.16.0-cp38-cp38-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.16.0-cp38-cp38-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8

torchtext-0.16.0-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

torchtext-0.16.0-cp38-cp38-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

File details

Details for the file torchtext-0.16.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5786e504214e788b8c3839c14997eed2115406a832cf6ab3e4ad6c9653c6550f
MD5 6b1dc7d2b4d49cf55e179c7617e93118
BLAKE2b-256 19ed0988b7ac1a6f1c977d868e727c5aa9193bf30bd69251b01fb8e66acf4783

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp311-cp311-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp311-cp311-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3dbe04e711bf8ecc514753f4a218a76ed59b3a1748df81c37ad39fa54a6e0d59
MD5 2fe3bf8a1a659f0cb5a8c4e7a6461a66
BLAKE2b-256 47b0413b8d45c60473c9f10e2b47a64587ecafab693aaeeb7645d4f49e7c10e8

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c8cf13132ec352ef6f5ca512faf0f309f1dcbd7632d8eb5979d039f244dcfc5c
MD5 d95ccbb5df5aae7668aaa600c639183c
BLAKE2b-256 f5f37f8e2114e8d908235450072a4f20e9e4e16f0fcd10762d55cbb9bae7e08f

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp311-cp311-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp311-cp311-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 138e661f5d61d6a684ee7dec712d30503d2cacdc55c1415c72abc488c601a2c2
MD5 87bc65137a9f06d1b92bd7df39e2b144
BLAKE2b-256 7c11b01222b7955e733be2c77547d531c70ca6e479db71a02bcbc5df8d68786e

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 34d9f0663ebc526501a4ecf9d7f302a8a014537a74271c1170efb8a22169427b
MD5 ade77b35d4bcd6c9047075aa011060d4
BLAKE2b-256 fe128e13cda45a66c4c06e110bf91372085feab8d89f12c367930063d320a11d

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 67087ebc95acd6c9f9679b5e33dc395a263c421e40ca44d3bba2e7cd48b99320
MD5 826443fd24598bf62605678b170ea8e7
BLAKE2b-256 1a4b40c40574e7f76cfea6b6b94928bb7d6ca44bf5aa1869347d8a71d7ff0563

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f033514bc7d8eaa9659e3f85ae874d4166b73d9de11c0c16e471e05bc42aef3d
MD5 db0ce8f83bbacb9f3525339b3d1b2222
BLAKE2b-256 725a33ea577c7fd88d8f61b131844c8d4d626a35636bf7a1b68e9309c69a1b24

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp310-cp310-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp310-cp310-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 d43b5045bdc27380fd3597801d6ae1d74bebd850fe435286eedec20827aefe45
MD5 2c1df2d05886538ca0fb8c41e954170b
BLAKE2b-256 6ead0c1e8c2b4c21a77b5b8c22ef38399635ab5bfaeffa6eeddd59b945f1ff2d

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 4a9761318b85dcc83364f2cb3322ea2144e7be2379451beb8f7c063e50d075b2
MD5 7e8b46165ba0919a7c602b0267372e47
BLAKE2b-256 2184a0d817f35de0514f68fb776d317ed28d6f8d884de5c0d236e7f876b68292

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 134979fca1a548ff1084a75b6d322fe7511007ff47d653e972f0a37ad2b0653a
MD5 5f66feb00c4c6999c9db8cbe755c5924
BLAKE2b-256 15921fd585c6c60b1d1611fd8645caf6d74f1e96fb8299957ca8e07fabbe2246

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2ef44c03277e2de82d3111a2004944c1af7fbc08c58c3d9538e81ca193885a96
MD5 e5064acefe0c8d4c77268df71517feeb
BLAKE2b-256 3fc331ffe478c958c99d43125747337fe155c23082933879c328447d42b6c24d

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp39-cp39-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp39-cp39-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 f371373e504985d1385df2a4081592ea8acfcc49e9800cbe8892634aad4f3669
MD5 778abe7d11ceaace06c4e09463d6124a
BLAKE2b-256 36798dfa6b032f9155488844fdf53a5c7e5650e6f6f7b31a76fe91ca61cb1650

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 d45b5c90e5932dfe6d7dcc176c2a9f125cc8143220f91d2d87f8947ebb26e04d
MD5 253898131cba60d2dedbb6f766eacbcc
BLAKE2b-256 072711228deb0780b4a879f32db875671f5ff2234a44e74680cda47d69d90acf

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9e81a554c3728893fdf2a0a14b605ea0ff1d835e79a6fd3668f87404fca890a2
MD5 a82d94c8c3e40e42b2ab071196cdeeef
BLAKE2b-256 052e4879034b9fd55a303bb897d1bafd2031149622143fd50184465c86eb714f

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 90f4674a3b05eafc13f0072b798900923392bcfba4d2cf4edd7c4c21b3210460
MD5 6f17e8f95bef7d863a15fc4bdcbad596
BLAKE2b-256 aaccedfc2f8ab72bff8d631d3d64003a9eb7df22deb1223647d7970ba69d899a

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.0-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.0-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 b21d2af2593d9bf796ec549f4b6b7b5d9b187236744cb20dcb07a9f8f8bc8744
MD5 db29d53d399cf086ecd11f1a2f869a1a
BLAKE2b-256 d2930a092fdd53f5f160a5adcddbfdfed250d1dcf38a8880224dff8392b0f7cf

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page