Skip to main content

Text utilities, models, transforms, and datasets for PyTorch.

Project description

docs/source/_static/img/torchtext_logo.png https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/main/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

This repository consists of:

Installation

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

main

>=3.8, <=3.11

1.14.0

0.15.0

>=3.8, <=3.11

1.13.0

0.14.0

>=3.7, <=3.10

1.12.0

0.13.0

>=3.7, <=3.10

1.11.0

0.12.0

>=3.6, <=3.9

1.10.0

0.11.0

>=3.6, <=3.9

1.9.1

0.10.1

>=3.6, <=3.9

1.9

0.10

>=3.6, <=3.9

1.8.1

0.9.1

>=3.6, <=3.9

1.8

0.9

>=3.6, <=3.9

1.7.1

0.8.1

>=3.6, <=3.9

1.7

0.8

>=3.6, <=3.8

1.6

0.7

>=3.6, <=3.8

1.5

0.6

>=3.5, <=3.8

1.4

0.5

2.7, >=3.5, <=3.8

0.4 and below

0.2.3

2.7, >=3.5, <=3.8

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Additionally, datasets in torchtext are implemented using the torchdata library. Please take a look at the installation instructions to download the latest nightlies or install from source.

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017, Multi30k

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

  • Model pre-training: CC-100

Models

The library currently consist of following pre-trained models:

Tokenizers

The transforms module currently support following scriptable tokenizers:

Tutorials

To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.16.1-cp311-cp311-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.11 Windows x86-64

torchtext-0.16.1-cp311-cp311-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11

torchtext-0.16.1-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

torchtext-0.16.1-cp311-cp311-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.11 macOS 10.13+ x86-64

torchtext-0.16.1-cp310-cp310-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.10 Windows x86-64

torchtext-0.16.1-cp310-cp310-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10

torchtext-0.16.1-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

torchtext-0.16.1-cp310-cp310-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10 macOS 10.13+ x86-64

torchtext-0.16.1-cp39-cp39-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.16.1-cp39-cp39-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9

torchtext-0.16.1-cp39-cp39-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

torchtext-0.16.1-cp39-cp39-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 macOS 10.13+ x86-64

torchtext-0.16.1-cp38-cp38-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.16.1-cp38-cp38-manylinux1_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8

torchtext-0.16.1-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

torchtext-0.16.1-cp38-cp38-macosx_10_13_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

File details

Details for the file torchtext-0.16.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 8c23f68276b7060eb3dd7791e9acafb6e584eee19f85de126dfe76aa79f8e4e9
MD5 9bf07eff12e147d8256863b06d882608
BLAKE2b-256 c693d824ca3fb7bfc9d7b5818006047f6803caf7aae9b831f7933aee0311ed04

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp311-cp311-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp311-cp311-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6d158b47b98e691180db5372b5082b82a375aa074496d7963fc2dbaca86c185b
MD5 f017ea8efe0ccf440aa5b56886c222fa
BLAKE2b-256 a5d81ae508affaaa3d6d41d044177ddaf22aa8d4e5281074340ad072094dd203

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 11698546aae805fcf6c87a73c6c17378b29145574a4ac2616fdf29f484d8039e
MD5 8e9d97eede336dcb997d0d605073c22d
BLAKE2b-256 f2216934dc4820c2bb12c8ae2608a2456df31a2c0c58973f2278059757f35f8d

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp311-cp311-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp311-cp311-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 582094b07e5d848782a8f66bace2f76a2338a237f061d6d789d6a97ff6d62e5e
MD5 fab41f0c9d1f59b0caf3bb7b0319d488
BLAKE2b-256 bbf7f45edd7798b7b0a07597e7b63fc884d7695c2ba95eb5f3c0b3f52d848426

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 46174ec69d7896b9b9ed487fe2fb82a8ca867d8e37ba01c0894c2f08008d0a0b
MD5 6b172bd914911bb1c40f4365ff1d1d48
BLAKE2b-256 78a1687ba1b122b398746a21f43182cefd7bf38d17e5e1786f758617ae23e666

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp310-cp310-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp310-cp310-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 17e20b4371b94c531a7bcc25d275dd6964acc34450d52232c41401c3beb6898d
MD5 7246f7d52d3548b199c58aaa67446be2
BLAKE2b-256 781bf993898861a8cf28e6cc56867daf932990979814228072d6efd0deda14b5

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e5e83e88176d8e74ef5e8ee66930abb21e74c6be4f40583edb83756123b7f705
MD5 944e67b639d91cc71718268cbdabbb45
BLAKE2b-256 bb1de4bd7f3f242c41a263f4c9dbdcfde8e35942d92d3b92326d57b76f426174

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp310-cp310-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp310-cp310-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 692a4b5c3b51f05eb92946ecb690843575d57d6c62fec0f852acf96bc8364b60
MD5 2b918a3cbda47fbf1bff866d604b7373
BLAKE2b-256 9e57daec12cf4351b70da9f3335e129f541061cdba0c9c4667c9745985c64859

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 844dd2e6cf993de2ec3b75576b0afd3c8078a6e3002bea47ddde4f0f5bbbc745
MD5 80e140b82f099926fefdf445673f1255
BLAKE2b-256 c25c548948e239534b1f5fa4f09c246a97efdc62bfdf9bd460d363dece847829

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp39-cp39-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 0e58f066859fa7ed54c2d5d509121092101b3a4eaeaf2c7d4997a712976209dd
MD5 9ebb181693d65a187ca1f1b7118b1443
BLAKE2b-256 3098fc09759c6b9eb90ef6574846e54604bc224ee8f8e8d53079cab0642e6458

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 48ecf80a4eb729451071b505820980cd5bb71593b72cd2555dc3943a8f373f44
MD5 57dec744429bfa24d186e8a056527118
BLAKE2b-256 6abe9a49c0e13794fc470a696ca387aea025a51e9382d45941e0e11ab0952bd0

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp39-cp39-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp39-cp39-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 889960716c169504402398d8f8c3a1cfab129abbf20e46889160f9a72ee77a04
MD5 9020a5db0640aa4c2fa7369b7449d829
BLAKE2b-256 349adf3cfdf4ab56517e9b7e45e03ebc4a8ea6c00f0e0fd8476b7a42fc070a52

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 0ae17e02379a2ced1c0d986627f31f7efee0901f533d8632b52ce5abb0485592
MD5 d6f1803a197fcccca36bd8a803666806
BLAKE2b-256 edb14d63ad4063090f2bb6a311c1893c1db370aa6c4e165c5e7005e6204f0849

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5e2cc739aea177466acda3ee21f95bbbf3cb96f4c1ccfed2b71efd4f319dc604
MD5 ad3944db7c466fa6ef843e5a7d0374ff
BLAKE2b-256 496050e301758b6224ed2e36f20f51cedf6b05552f41cc156dfd3190bf7df824

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0ab8d91a0029cd4ff6d11fba9b48bdb8ee5c8408ecca59a5128b32ac84697145
MD5 af16a7d0f4bba06fcb9d2f07aa43021f
BLAKE2b-256 46ffed7f6070f10cf908903146a53e6838bc3ee89ce0cc79c22750bc3764bcdc

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.16.1-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.16.1-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 c3c726a6de59c410aefd286b030501ada78dad49a53c2e32fd535826c884d40f
MD5 6574f376a014c771a4e1aa33bc1ce2df
BLAKE2b-256 61a0e4b7da5c5ef69baf4b4be11e416ee64c7ec49c121c24d4c43d26c216635f

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page