Skip to main content

Text utilities, models, transforms, and datasets for PyTorch.

Project description

docs/source/_static/img/torchtext_logo.png https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/main/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

CAUTION: As of September 2023 we have paused active development of TorchText because our focus has shifted away from building out this library offering. We will continue to release new versions but do not anticipate any new feature development as we figure out future investments in this space.

This repository consists of:

Installation

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

main

>=3.8, <=3.11

2.2.0

0.17.0

>=3.8, <=3.11

2.1.0

0.16.0

>=3.8, <=3.11

2.0.0

0.15.0

>=3.8, <=3.11

1.13.0

0.14.0

>=3.7, <=3.10

1.12.0

0.13.0

>=3.7, <=3.10

1.11.0

0.12.0

>=3.6, <=3.9

1.10.0

0.11.0

>=3.6, <=3.9

1.9.1

0.10.1

>=3.6, <=3.9

1.9

0.10

>=3.6, <=3.9

1.8.1

0.9.1

>=3.6, <=3.9

1.8

0.9

>=3.6, <=3.9

1.7.1

0.8.1

>=3.6, <=3.9

1.7

0.8

>=3.6, <=3.8

1.6

0.7

>=3.6, <=3.8

1.5

0.6

>=3.5, <=3.8

1.4

0.5

2.7, >=3.5, <=3.8

0.4 and below

0.2.3

2.7, >=3.5, <=3.8

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Additionally, datasets in torchtext are implemented using the torchdata library. Please take a look at the installation instructions to download the latest nightlies or install from source.

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017, Multi30k

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

  • Model pre-training: CC-100

Models

The library currently consist of following pre-trained models:

Tokenizers

The transforms module currently support following scriptable tokenizers:

Tutorials

To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.17.2-cp312-cp312-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.12 Windows x86-64

torchtext-0.17.2-cp312-cp312-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12

torchtext-0.17.2-cp312-cp312-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

torchtext-0.17.2-cp312-cp312-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.12 macOS 10.13+ x86-64

torchtext-0.17.2-cp311-cp311-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.11 Windows x86-64

torchtext-0.17.2-cp311-cp311-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11

torchtext-0.17.2-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

torchtext-0.17.2-cp311-cp311-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.11 macOS 10.13+ x86-64

torchtext-0.17.2-cp310-cp310-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.10 Windows x86-64

torchtext-0.17.2-cp310-cp310-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10

torchtext-0.17.2-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

torchtext-0.17.2-cp310-cp310-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10 macOS 10.13+ x86-64

torchtext-0.17.2-cp39-cp39-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.17.2-cp39-cp39-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9

torchtext-0.17.2-cp39-cp39-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

torchtext-0.17.2-cp39-cp39-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 macOS 10.13+ x86-64

torchtext-0.17.2-cp38-cp38-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.17.2-cp38-cp38-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8

torchtext-0.17.2-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

torchtext-0.17.2-cp38-cp38-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

File details

Details for the file torchtext-0.17.2-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 0664f14781a6b045cb5e6610f3ad6240a8f7c1cb92e9567efaf54836a1a56a38
MD5 6a389ed07b7db65e622f0dbd5a8563a7
BLAKE2b-256 cc7dbcd44b5fe8bb39c6f01e6adee3d86cb2dce1c0250b3d41bec16aeebeb95c

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp312-cp312-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp312-cp312-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 04fb201d46f68a7f708aac7cb9605f8c4a67927f311849d5f00a8095f6132fe3
MD5 e17999cd0ab9ba129e8112f5f9d454eb
BLAKE2b-256 1ac379430ed8fac7e76dd230a51734c1e1ac2bf22cd6f0f094b4b5c0297c7792

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 39d34491ec9921e7f6877ec28558d401807364006ae6035ff980361b1a09e7e1
MD5 6d02e6652d2b47652d95945383ac39ca
BLAKE2b-256 6ca5f830c0d9058e3dca73df55f4feecb2b4cc7a5c139ecab052d8a3d05259f3

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 1eb3d76abce1f736f369787e089beb4fe6f93665f304fea996fe634e28c13bb9
MD5 34124196e777bf8c62c46195edd87731
BLAKE2b-256 50453172333c231f3b2e8c87525e69dca50353dccdc0752b6db6f1f6a80bc8df

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a9f2af1380e9cd7bc55839ffa353c7bfd62eeae9e48383085e3b4494dcdf82c8
MD5 8477c88e791ebef2e2c4e154a1c41da1
BLAKE2b-256 cee1b1d577578800a603cbeead5f52d46649a413a8132414131a944284fdaeb5

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp311-cp311-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp311-cp311-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 408b2c07351a6ac4090409860dca9fd8302e189403d77091c30a112824686c83
MD5 c3580341ffdfa3987b8dff16463b9087
BLAKE2b-256 e0fc4124c43f80915738e7005e31eacc49f3e02afe5a49476829722c8c8ed589

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5ea1b44b2ea6792ec0aa1a72527dbd8d92c46e4b5f7180e854082df0e1ae4d60
MD5 bef26a1aad4b078181f5c6db55abd19d
BLAKE2b-256 c70a3af1e9d2577a5a63b32122c7224223f958d10b6e21c557190c0747478ff3

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp311-cp311-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp311-cp311-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 1c0235333af0566f343bcfec7ffab0ef217152d75f3951432d4edd9eeb4185ad
MD5 577c918302dc773a2238e025e4e16f10
BLAKE2b-256 843028a6aafd93e94bfabac7529b9ef8b7709ff92a0183f932eee066e57178ce

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f09b677316bb2c1a2a326a01f3549a8f3096170209d5676a7dd6f47583b58e86
MD5 1e2c963fbd88d479f41acb230ba51d4d
BLAKE2b-256 615c8d26460eabf7f8a16174da5a581fddf8058e28411b81fe7f25f363f13ece

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6bbfd762aa804b48bd7bd454997597adaa9cf785dba252a2e03241e48c6dbd35
MD5 1c58ffc76748d797410a426c70396d11
BLAKE2b-256 1d5e85ce35a49c9797684376d580f2a39903e72e38f8d01fa439fad9b39a3413

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 84b8b907f6bfbf637ea80521060c8800d3b9f5a5b4ba2daf88ce0f6e728dafa0
MD5 ae07400d49e69c5cb0b688ae338e376b
BLAKE2b-256 68e36480715a1422960bbc926814f243b44de2fc6ed3840138081dbd6b3de2a3

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp310-cp310-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp310-cp310-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 760ef1464ffe72c5a25f53617b011afeb31641f0ad016479b96d151d2acd8396
MD5 06214689e8badfc786d42babd7ace5cb
BLAKE2b-256 aefcb814eb786d6f2aca83658b40bb095e35e395824be99c30d75b296aeb4dba

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 a6268589890779947d8a5d1a589f8c9fcc88b76cbaf93e909005414cc660b536
MD5 ce5be7aa32609991b50eab5ac1f6f20c
BLAKE2b-256 2c7c841007b1115b9c36a636c9ac673b1080f10018e52fa9abf98bfe2005eb73

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 486b51a83f880700824be05ea0ea60e23cd821d15e0a89300c26d35b6f4eb8da
MD5 09a43bc2fce10a1307d267067f813e09
BLAKE2b-256 a671ebdb1e7f0cfeb2a3315ceab001d4815e3a85f1ab721d3e28bbecc0b85b5d

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f07201e6cd6d3ffad15cf3a65eb0e8409ad82a2859b880122e1b66aba5a3b2dd
MD5 abd3ca31d9267408e742963b9d9548c7
BLAKE2b-256 644be1ca9501631565c47e3a52eb8b3849a26f5364ffb3651f853d1a9524ca23

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp39-cp39-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp39-cp39-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 41de828f25f918db53145cf38bfeb1ae0f9e9a0d9c66494dbc39779475055036
MD5 96ae80f3bf0c898907027bb07648cc2f
BLAKE2b-256 0157dd9279ac3947f7792452329d0dfc93b5f40783533d8822ddd57a1677504d

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 6d60cb71195cd75f30ac065ea60f6dd4d0b313b806724b07ae3bf4d93ece568f
MD5 47286a7888b5d120c4bcb7e9b13f590f
BLAKE2b-256 28b0ab8e9a5f44d1be33822f71bed3244f0e9fbbe89e27220bb02fb7f8b02312

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8b531ca852280211ea13cfab031d196b9ce010bf4ec6434605c8419e27fc3231
MD5 5cfb7b3c8c58e0c80e599cdd6a8fc15d
BLAKE2b-256 0a3a2126f62bdf1d3192cc32ca456353415dccddcd9527f356a8653f97dc226e

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2f3d707f028b4876d12cb72a4a53c5f1c6fef7987b991cb2bc47662db07e6f92
MD5 a800b461925920c796b7f31283667022
BLAKE2b-256 2cdc412145b376f766f91f86603bb9a1fdf258a0f8b2bd7d8957a4a709dbf00b

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.2-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.2-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 72318d42a831e93a7ca22b801b8790790aecf8fc70e59d5ef0389ebb2b9f62d7
MD5 b9e8b9de2a06a8600f02040911e9fc8c
BLAKE2b-256 0b065e8f0d07d665715a6eaffc981b4eca669e33704a0f98bf62db7b82a591a3

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page