Skip to main content

Text utilities and datasets for PyTorch

Project description

docs/source/_static/img/torchtext_logo.png https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/main/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

This repository consists of:

Installation

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

main

>=3.8, <=3.11

1.13.0

0.14.0

>=3.7, <=3.10

1.12.0

0.13.0

>=3.7, <=3.10

1.11.0

0.12.0

>=3.6, <=3.9

1.10.0

0.11.0

>=3.6, <=3.9

1.9.1

0.10.1

>=3.6, <=3.9

1.9

0.10

>=3.6, <=3.9

1.8.1

0.9.1

>=3.6, <=3.9

1.8

0.9

>=3.6, <=3.9

1.7.1

0.8.1

>=3.6, <=3.9

1.7

0.8

>=3.6, <=3.8

1.6

0.7

>=3.6, <=3.8

1.5

0.6

>=3.5, <=3.8

1.4

0.5

2.7, >=3.5, <=3.8

0.4 and below

0.2.3

2.7, >=3.5, <=3.8

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Additionally, datasets in torchtext are implemented using the torchdata library. Please take a look at the installation instructions to download the latest nightlies or install from source.

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017, Multi30k

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

  • Model pre-training: CC-100

Models

The library currently consist of following pre-trained models:

Tokenizers

The transforms module currently support following scriptable tokenizers:

Tutorials

To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.15.1-cp311-cp311-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.11 Windows x86-64

torchtext-0.15.1-cp311-cp311-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11

torchtext-0.15.1-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

torchtext-0.15.1-cp311-cp311-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

torchtext-0.15.1-cp310-cp310-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.10 Windows x86-64

torchtext-0.15.1-cp310-cp310-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10

torchtext-0.15.1-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

torchtext-0.15.1-cp310-cp310-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

torchtext-0.15.1-cp39-cp39-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.15.1-cp39-cp39-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9

torchtext-0.15.1-cp39-cp39-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

torchtext-0.15.1-cp39-cp39-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

torchtext-0.15.1-cp38-cp38-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.15.1-cp38-cp38-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8

torchtext-0.15.1-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

torchtext-0.15.1-cp38-cp38-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file torchtext-0.15.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 2a27ed885d16a86007254df4fb067d5633b2e1451a5ed9b2ba179780c9337c9b
MD5 ff26d91abfc3f43999c2487d0aff755f
BLAKE2b-256 a11948e58d6e7bcbdd8bcddd049631d35164ca1a12ebda7c00fabe6b231c05e2

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp311-cp311-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp311-cp311-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9160908604b8c65cc79ec2fc847f409ff455613f466a9fd23b2e4730610ce16a
MD5 35164b4f587430f0e8e7476c53bd8a52
BLAKE2b-256 2349d4b0909f0ed203898d4075e8dd50796fb9e3b1719b8d6b8d4b347e8c1326

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8198eb8869e2d4502e3c2eb48d69bc3543d09d40dd902e561187c97f3c323f72
MD5 6b0f51fd9a3205011bcbe88f905fef9e
BLAKE2b-256 6baa60bd241fb5a827714090c27798c60cef68690a1bb42a09991d27137595a6

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 382b20bf1a9f489a9a8bd9a1477a94a5b6b10e1cbf115badf01c5530b8ab405f
MD5 74ebdc298576276be6cacdbb4550299f
BLAKE2b-256 7b547533da077f3d1a753d7aa3fa954f872f286ad19b15e98d2a7a4cb719bb5c

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 dd74559946c1cfb0b2ff2100b940736b2d52e1a48dbe3f3ceee5472509f2fe90
MD5 d027bd0c18e4554f5a98b0a5d3fc4c76
BLAKE2b-256 f987de536b183c863a75dddff63de82eb9861346cef109f5e02d2c2ee50b155e

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5bd512b62cef961f070d237442de08fa9fb3ca16851b2679f4d86b9bc4654e93
MD5 53cf231d080a21ee6bb1c045e7fbae8d
BLAKE2b-256 33cd79daa0b054250f7f22d3e2a12ef4754aac19b5e97dfbcf8045a014a0ba52

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b7b58c167b3e3b0bfdd678fa3da6fcb99cb451a30e6705a0e1552cb34844b056
MD5 941220798d45bb70a0abc13fa346726a
BLAKE2b-256 0f562020815044393d97433789825bad2834600570cc241bfef4548990368ce1

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 780effc369e1a615c0136651a0f509903fac992ad20be1a36f593954191043a0
MD5 2fb0a3ded1c7ae96f0e90e0d456919e7
BLAKE2b-256 1fa484bf6c9c1ddd6aa737b705ba68a53e4e0ffe688f448406a48c1951930d15

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 e5410aaacb4e794e9f182308dcff773522e5410f0ab55ce20bc6004d5f4d1fd6
MD5 042643d448db2f7a74b96e2625b58695
BLAKE2b-256 14445b8689ccfef1661f6311ab20e62e47156d27b7574c2ecd127ebe06605285

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 daffd0ec4183b99e3fa6b6ca3c73f37e36550df0a7a5ef5055f17b8a58c3f04c
MD5 a2aeefe9ee0aa62941f3f355167454db
BLAKE2b-256 f0e98855de2a48c6357e90a366f0051fba3ded55d6f2e35d9830030bf231f917

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 57990785fdc10664935b5243011f6acfc8fc465b4a7e46385f927b91f1fc8dab
MD5 bfdf0cefe9322967bba8b288e522dbd6
BLAKE2b-256 59797571e579bea1e8efbe814219bbb672232a5619e7443f9d40008c1f91a98c

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5903362de6cb5d673d36d54e8cb67143094b430a602565d668278ebab2351416
MD5 5d41f8d0b99fbb8f9422983168e789d3
BLAKE2b-256 706c12a374a59674713650fab187ea22866ae67ba7060682ed7cac1a28649299

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 0bc6caad47c870b138243ef0d4829732ab747221081bbbb3d03ac68976f47d54
MD5 c284b62b1b9f96a9cce74650ef048378
BLAKE2b-256 f437d79858ab543fbb058c6917ccdb6b00dbf1daaf602dddf66f78ffb3d181ca

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 912c97f94b147a403271d1aa1ba8a76658082b00eda0565aa8d0e50c5fbba26e
MD5 3e7f5ee047110a82679cb340c5b1e14d
BLAKE2b-256 0b133f5eecbfd24af5f14c6ab4c4f2ac0e263b0cea7020d6ac9f14a4933a6099

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6db6f6812c77d38f40fa0207b148ddaea1849dba6ab335065fab761d989706f2
MD5 118cf1b2a668f5a7d48964fef94cf60d
BLAKE2b-256 acba5c264fd872735f7765ddddcc14f5e5d73067e0a726f67d1c9d0c575a630f

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d7e6fe55c3aecbfa2450cd78cd3b3bc93674a4efd10b41b4bfb91537cca83f2d
MD5 c746aa775365c259eeffbcb9ec9f6c3b
BLAKE2b-256 1801f13d440fef8410dda8725e23e846bb196a5b3314a65db84d1bcc04182d40

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page