Skip to main content

Text utilities, models, transforms, and datasets for PyTorch.

Project description

docs/source/_static/img/torchtext_logo.png https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/main/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

This repository consists of:

Installation

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

main

>=3.8, <=3.11

2.2.0

0.17.0

>=3.8, <=3.11

2.1.0

0.16.0

>=3.8, <=3.11

2.0.0

0.15.0

>=3.8, <=3.11

1.13.0

0.14.0

>=3.7, <=3.10

1.12.0

0.13.0

>=3.7, <=3.10

1.11.0

0.12.0

>=3.6, <=3.9

1.10.0

0.11.0

>=3.6, <=3.9

1.9.1

0.10.1

>=3.6, <=3.9

1.9

0.10

>=3.6, <=3.9

1.8.1

0.9.1

>=3.6, <=3.9

1.8

0.9

>=3.6, <=3.9

1.7.1

0.8.1

>=3.6, <=3.9

1.7

0.8

>=3.6, <=3.8

1.6

0.7

>=3.6, <=3.8

1.5

0.6

>=3.5, <=3.8

1.4

0.5

2.7, >=3.5, <=3.8

0.4 and below

0.2.3

2.7, >=3.5, <=3.8

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Additionally, datasets in torchtext are implemented using the torchdata library. Please take a look at the installation instructions to download the latest nightlies or install from source.

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017, Multi30k

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

  • Model pre-training: CC-100

Models

The library currently consist of following pre-trained models:

Tokenizers

The transforms module currently support following scriptable tokenizers:

Tutorials

To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.17.1-cp311-cp311-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.11 Windows x86-64

torchtext-0.17.1-cp311-cp311-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11

torchtext-0.17.1-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

torchtext-0.17.1-cp311-cp311-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.11 macOS 10.13+ x86-64

torchtext-0.17.1-cp310-cp310-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.10 Windows x86-64

torchtext-0.17.1-cp310-cp310-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10

torchtext-0.17.1-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

torchtext-0.17.1-cp310-cp310-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10 macOS 10.13+ x86-64

torchtext-0.17.1-cp39-cp39-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.17.1-cp39-cp39-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9

torchtext-0.17.1-cp39-cp39-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

torchtext-0.17.1-cp39-cp39-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 macOS 10.13+ x86-64

torchtext-0.17.1-cp38-cp38-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.17.1-cp38-cp38-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8

torchtext-0.17.1-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

torchtext-0.17.1-cp38-cp38-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

File details

Details for the file torchtext-0.17.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 30530f02b23c0cd52b1a4c375cecaf4d9918a556033891f0047973c8fb9674b9
MD5 6b2d5de04a3d12fcad1eb34fcce57902
BLAKE2b-256 b2cdadd336798f3ebfecaadf1f5aed211216ae9ecb6af3d82e62ecc0af9bbdd8

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp311-cp311-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp311-cp311-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d68701754082303dbcfeb177cb04055d66741672746a5b216a3621b14f491424
MD5 50bb54401b5094b4b999d7eebbada0d0
BLAKE2b-256 e499034b8a451109d8172afc81cb7fada7634b0db4a4bcfaf61188b61c819ea9

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9c17d7a9cd2217f0a944914bb2f47608efc627cc4ab5dc7ca1cf068a42d84d6c
MD5 2a8acc29b8e07b7a64ea2afa556e65c1
BLAKE2b-256 64ef5cf09110bd723826f095977920c79429f030411be71e305f69c88b8c6c3a

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp311-cp311-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp311-cp311-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 62a3553bf5caba04a86ca5dd41b0e80f23a9e0fb414215372f8a3adef8f91dbe
MD5 c1cd3038d7dd4fd1cf6ff9937f59561e
BLAKE2b-256 fce364db5527f64c514ca3213863a7696b3dd8ddf8bd5a1642b2592feb91c5d0

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 95170cdb941a02d26180bf8e64b22da4ee3131d98070789d0976e2800c08afa5
MD5 f04b64fc4acd43810c1a72474c597275
BLAKE2b-256 dba67337281cd27fa432a7c235d579e76d4e36f27344c63a3fc8e3ee72e03d1b

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c02f59e6f99346cf92970cf023ce5d0984fb07488bb775933dd6039bf054c392
MD5 dbcfa0c69bcb16433940c14290dcdd9e
BLAKE2b-256 b3d2332d85dc3529c573dc03b3a888b52b70702e0293d3d4e17c7161b1f7a0d0

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b404312a6ca46b7a487b7f4b00f3c85573553d6d4a4b46412c3164154a248a1f
MD5 73dd3796823be1b3ce82f2e9efe6a2e2
BLAKE2b-256 a8d3d804bc464885112eb2710592afd98eed227e6956a622b3d0421d616d833c

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp310-cp310-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp310-cp310-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 769b563e5e3a0ebb13759459f65396a6cbc24f281d8faaf94ff1597ded26da11
MD5 e615bdaaa8604853c28aad479bd5d2f1
BLAKE2b-256 83004164b7dfe20114c62dbaa335a246190b4dd354d0e022a9befa9afd652860

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 e5867499e1572231535fcaf08b696b42c3fe73b406776999f6f4fb6d197d1deb
MD5 3016142934e5893cfe200572eeb35fd4
BLAKE2b-256 e983b80cafbf582335e6b60b38a70360c0230bd39588dd8c56edc118ca4d487d

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c7dc3ebac4b060e01abeb6bf8575c3a79e4f7f0b0701d1b683e389d3090e3c81
MD5 9576385908127aae7c025028f56f1d38
BLAKE2b-256 e7a08c0828a5f84c5abbd6a01a2779a85286138fe1bd248d6de36fbdef4f1bab

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 92527ab9d452030663e1ff292dbd85935fbdc288197b3e4673ead92913dc2249
MD5 070983c99eac26e02bf2aaa38b291ab2
BLAKE2b-256 0715ddd33ffe7608202a80fbfcef8828061256485a328b4abc9d59492e2ca76c

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp39-cp39-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp39-cp39-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 a5896f9cbf9acd11619c1c444d8e78d8e3d250d53a56b521619b83587dbd4972
MD5 244f4b8b7d8c839d626356467c3804e2
BLAKE2b-256 60a84e4a612659d6a7dd9483e822e3b0b87a0e44efbf65f50ccda2e3e9ccc824

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 e4d71c6dc27dc1532f5d7687f9ed037f44d55d333cd84368ab7aaa435aab3a20
MD5 17c5b6840990d51fc415b285a1559868
BLAKE2b-256 e62da16547ff289542e05bb1dc3ad613c990261c2e6d58077e5bc936164a2a93

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 087d7195cf1c541cdb0f7f218f4f3480d5d3a6f889c6f01b786fe5edd7f0b088
MD5 d7abf950690a7375ae3e76993c7103c8
BLAKE2b-256 cfb0b6338676f1545fbe88001b38f016dbc551e298b136412bdaa3fa2c1981c0

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0c13f45175e596eba668bd772b6db76ce56b123abca178ae17d6bccaede97c3b
MD5 6e21f3db69402bab7459ab7257970080
BLAKE2b-256 bc80b33b22f06f2ee6521c1fdc7ac4aa20ee2f745099da132901888db95e9295

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.17.1-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.17.1-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 1dde7a2f1bb4e2642c879a12f18223ae32251d4e419de3f90f3e3b9d6453672e
MD5 da1704382ce9abdef9068360d128e393
BLAKE2b-256 b45cf8865f5c6b1cfec68911aaf0b79e0310a067b8ac161fe954b5aaec2faffc

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page