Skip to main content

Text utilities and datasets for PyTorch

Project description

docs/source/_static/img/torchtext_logo.png https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/main/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

This repository consists of:

Installation

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

main

>=3.8, <=3.11

1.13.0

0.14.0

>=3.7, <=3.10

1.12.0

0.13.0

>=3.7, <=3.10

1.11.0

0.12.0

>=3.6, <=3.9

1.10.0

0.11.0

>=3.6, <=3.9

1.9.1

0.10.1

>=3.6, <=3.9

1.9

0.10

>=3.6, <=3.9

1.8.1

0.9.1

>=3.6, <=3.9

1.8

0.9

>=3.6, <=3.9

1.7.1

0.8.1

>=3.6, <=3.9

1.7

0.8

>=3.6, <=3.8

1.6

0.7

>=3.6, <=3.8

1.5

0.6

>=3.5, <=3.8

1.4

0.5

2.7, >=3.5, <=3.8

0.4 and below

0.2.3

2.7, >=3.5, <=3.8

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Additionally, datasets in torchtext are implemented using the torchdata library. Please take a look at the installation instructions to download the latest nightlies or install from source.

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017, Multi30k

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

  • Model pre-training: CC-100

Models

The library currently consist of following pre-trained models:

Tokenizers

The transforms module currently support following scriptable tokenizers:

Tutorials

To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.15.2-cp311-cp311-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.11 Windows x86-64

torchtext-0.15.2-cp311-cp311-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11

torchtext-0.15.2-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

torchtext-0.15.2-cp311-cp311-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

torchtext-0.15.2-cp310-cp310-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.10 Windows x86-64

torchtext-0.15.2-cp310-cp310-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10

torchtext-0.15.2-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

torchtext-0.15.2-cp310-cp310-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

torchtext-0.15.2-cp39-cp39-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.15.2-cp39-cp39-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9

torchtext-0.15.2-cp39-cp39-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

torchtext-0.15.2-cp39-cp39-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

torchtext-0.15.2-cp38-cp38-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.15.2-cp38-cp38-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8

torchtext-0.15.2-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

torchtext-0.15.2-cp38-cp38-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file torchtext-0.15.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 3818c39032150d2c787a07692d92163af7a52b90dab16f985bb45273728a4207
MD5 7cc24efb59d63b3c852a28135d93ae4a
BLAKE2b-256 251928421ab9c9a3ea62d712e77d30b02b69318d60e843d9b4956aa5c21337fe

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp311-cp311-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp311-cp311-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f44fa2bdeaba751eca9ff77c28b374234b5281c7b1d40cea38f67a871e64bfb9
MD5 bb095eb17e585d5d0a0c1db25d63608d
BLAKE2b-256 ac9ab776388814bb66886852aa97758f9f61525175b801c57416e3933696056b

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp311-cp311-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp311-cp311-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9650a0184528cdce9201cc656575219b4539536a991d1bf86c8be8b424e58f53
MD5 ee8d97a1780d64345131dd6ca0f01c6d
BLAKE2b-256 90b079271e5d8c46bbf38e4f726ab58daa82051589a0074e4cf2125f8b70443e

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fa6f9115e71471cd6060abf4899663719d268d5662950a65357894234fad74a7
MD5 a1c5dd4580b3b46f64f36475275a05e9
BLAKE2b-256 59e91d3ecd04ef057277ba7e3ad0b8793f0d161b60a4ba7fe0b9aa6870a498e4

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 cbb03114513fa44cd6155f46c5341d4674b42cd86ac1e5123b3565a5295047c2
MD5 5b34f389dfa4af3fb9f9becd9362fd52
BLAKE2b-256 cc05d7d4ff691f80124f8d2b2d776cedc90eed101bc6f326ee20fde3fbd0fb44

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 a2dfd26a35e35e85d934c5fb58bd867bf61e73ca086f1736f36dff3a22c6083a
MD5 ff0db7911e676d44b4692d8e802284cc
BLAKE2b-256 b3dcb681929ca8afeb0e42e094f433f249ca978fc0442bbfd4036a9b67b0168b

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp310-cp310-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp310-cp310-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8fe27a3e2c0ae42f07ff5cae0ea19c49b23d05aa3666bdbc1a262d7b26b8da0d
MD5 41cadaa4dc09d37c9ee569d270ca98fb
BLAKE2b-256 39b96c565097ba27ecbfc8ec912990079556ac402bfa643471d45dd5aa23a65f

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d265a648722164043bd76e59a8d49a44a96a2f4f74e5e1f84801c047db9f20ce
MD5 8ef02cdbd9b72cc54ec46bb2dc6edf62
BLAKE2b-256 5c088102919752a7918537365c2b2ebb1083200fd02fe8af4dbef8a67970fba2

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fe1794f9b2a9f4923f87488783b06d6c683f332a1db60d17cf65ba3f7c724c56
MD5 af731fdd4ce8f833e4b7d89dae6ba93a
BLAKE2b-256 333068f31ee6028bff657874da9de5c8c7dadbde50b41eaa9e772640713acdf0

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c60c2ec240d86c437478821891c0b167aa0b327f30047196f25da55b82f6785b
MD5 d5a5e3704442bcb1060573d1befe3673
BLAKE2b-256 68a057e9a86282004742bb16c547272b5bb2adf62e3574ae2be3067e19a98ec7

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 0c7b4671df132751323d09fee91dcc59da380d5172449ba903f04d4646a906aa
MD5 712f93700fd1522a317b2c7f2ef49cc4
BLAKE2b-256 03e14918fca43267cc9073e72e07a62e9f52f196610695404b6559476a38402a

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp39-cp39-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp39-cp39-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f8d8487945b0ea3df2fedf8a9cb3e7c2165bdc2107900e82c8725d96312addd3
MD5 b1dc4a03890913434dcc475670ab7356
BLAKE2b-256 b87118529b1b4dae380d9a7da73da7013be3e3d4df1d42e337543fc2f58d7ee9

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 a87c582bddf809a62fd6ecc3bc92a20cb78dba055fcf43ddfce668c4be8352b4
MD5 98ab5164dbcaefeff71007d552b51a4d
BLAKE2b-256 368ba0d1b0fed5a588c3030133640a5fb2de6f3f6c7c184f4b251552595d53f1

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e8d4e81bfad9ee7e4bc66ad41f96878cf5e3c4e9e829de824081bd3a77bc7aa7
MD5 ae51fa14a012b7c3bcb7eca7a4ad0ca8
BLAKE2b-256 3e0b4e34cc85ba57487f9292ac71bf6ee4adfb050b8cc31f1b0e9030bba4c9ea

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1d8065c4032bfe3558c56e9090ad0cf9c7a01f8800235a938feb41244226b9d4
MD5 6da6df9a44d2bbd49fca362067a4a304
BLAKE2b-256 a372aa2db94c7c86adaf09d469b643f00bcaf0f013dc5a7b1904bace15f65c35

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 6935af64b2af54bd3f4be71ab0a3776f2074834adc2f44e132a9f0da5f69ba47
MD5 456e7f1447e06b3e79c7f9d6ca3b08c6
BLAKE2b-256 3d832e6ba62383349e07d78fd098c85011dd0e29483ce6a2349875289d8a97a9

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp38-cp38-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp38-cp38-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7e57d7b7b05d56d8d80be5fc190f68c428fe26f98893f77783f9a1e3876e36e1
MD5 d77bf19ce29a3bbc8cdb2db86582f82c
BLAKE2b-256 a80c80ec179a48befeeda2329b7c38ef32726e60b9c50270529694c89ebf1c16

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 826ea4726e421d2c9309b35f485e46e10cd0e9a7f2b26275be98ccf73c2691be
MD5 6c36260363553e6b61ad7be11c35ab48
BLAKE2b-256 ca628066c2c507b4d185d47c23c657eca9dcfb26ac953c8eeb5e04e6e51609fb

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1f58b4a94e17aada5e2712f2d48437fa8617a09d18160064b0df0c094ad2a3f2
MD5 7b5bac6d04374d09221a10b8fc2bfcde
BLAKE2b-256 4fa82eb0ba7e7811980d9024f10eb00dd6f9cf65924414e159ac2f81a81c74ce

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.2-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.2-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 91edf72b1819432d3524bcd1ab8d9b649afa4d0ed1dd4df594252fa5d073078c
MD5 2859d00f4df237d43ded40bf3dd7e533
BLAKE2b-256 f3649a304faa4d8ada3ca506a007ee7ecf4c02ef28ec9e56e0377562ee7484d8

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page