Skip to main content

Text utilities and datasets for PyTorch

Reason this release was yanked:

Contains an incorrect dependency on torch

Project description

docs/source/_static/img/torchtext_logo.png https://circleci.com/gh/pytorch/text.svg?style=svg https://codecov.io/gh/pytorch/text/branch/main/graph/badge.svg https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchtext%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v

torchtext

This repository consists of:

Installation

We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. The following are the corresponding torchtext versions and supported Python versions.

Version Compatibility

PyTorch version

torchtext version

Supported Python version

nightly build

main

>=3.8, <=3.11

1.13.0

0.14.0

>=3.7, <=3.10

1.12.0

0.13.0

>=3.7, <=3.10

1.11.0

0.12.0

>=3.6, <=3.9

1.10.0

0.11.0

>=3.6, <=3.9

1.9.1

0.10.1

>=3.6, <=3.9

1.9

0.10

>=3.6, <=3.9

1.8.1

0.9.1

>=3.6, <=3.9

1.8

0.9

>=3.6, <=3.9

1.7.1

0.8.1

>=3.6, <=3.9

1.7

0.8

>=3.6, <=3.8

1.6

0.7

>=3.6, <=3.8

1.5

0.6

>=3.5, <=3.8

1.4

0.5

2.7, >=3.5, <=3.8

0.4 and below

0.2.3

2.7, >=3.5, <=3.8

Using conda:

conda install -c pytorch torchtext

Using pip:

pip install torchtext

Optional requirements

If you want to use English tokenizer from SpaCy, you need to install SpaCy and download its English model:

pip install spacy
python -m spacy download en_core_web_sm

Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses:

pip install sacremoses

For torchtext 0.5 and below, sentencepiece:

conda install -c powerai sentencepiece

Building from source

To build torchtext from source, you need git, CMake and C++11 compiler such as g++.:

git clone https://github.com/pytorch/text torchtext
cd torchtext
git submodule update --init --recursive

# Linux
python setup.py clean install

# OSX
CC=clang CXX=clang++ python setup.py clean install

# or ``python setup.py develop`` if you are making modifications.

Note

When building from source, make sure that you have the same C++ compiler as the one used to build PyTorch. A simple way is to build PyTorch from source and use the same environment to build torchtext. If you are using the nightly build of PyTorch, checkout the environment it was built with conda (here) and pip (here).

Additionally, datasets in torchtext are implemented using the torchdata library. Please take a look at the installation instructions to download the latest nightlies or install from source.

Documentation

Find the documentation here.

Datasets

The datasets module currently contains:

  • Language modeling: WikiText2, WikiText103, PennTreebank, EnWik9

  • Machine translation: IWSLT2016, IWSLT2017, Multi30k

  • Sequence tagging (e.g. POS/NER): UDPOS, CoNLL2000Chunking

  • Question answering: SQuAD1, SQuAD2

  • Text classification: SST2, AG_NEWS, SogouNews, DBpedia, YelpReviewPolarity, YelpReviewFull, YahooAnswers, AmazonReviewPolarity, AmazonReviewFull, IMDB

  • Model pre-training: CC-100

Models

The library currently consist of following pre-trained models:

Tokenizers

The transforms module currently support following scriptable tokenizers:

Tutorials

To get started with torchtext, users may refer to the following tutorial available on PyTorch website.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

torchtext-0.15.0-cp311-cp311-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.11 Windows x86-64

torchtext-0.15.0-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

torchtext-0.15.0-cp311-cp311-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

torchtext-0.15.0-cp310-cp310-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.10 Windows x86-64

torchtext-0.15.0-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

torchtext-0.15.0-cp310-cp310-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

torchtext-0.15.0-cp39-cp39-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.9 Windows x86-64

torchtext-0.15.0-cp39-cp39-manylinux2014_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.9

torchtext-0.15.0-cp39-cp39-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

torchtext-0.15.0-cp39-cp39-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

torchtext-0.15.0-cp38-cp38-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.8 Windows x86-64

torchtext-0.15.0-cp38-cp38-manylinux2014_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.8

torchtext-0.15.0-cp38-cp38-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

torchtext-0.15.0-cp38-cp38-macosx_10_9_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file torchtext-0.15.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 86bf293a0d76ffa8eec6b6f99995a74e78621548c3bc83a226600f16804661f3
MD5 1128744850637c48c5de075869e6e2c9
BLAKE2b-256 2c52807e24ecaacf779b070d42867b58b259a955c47e2aba85582daf924795fe

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp311-cp311-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp311-cp311-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 03d42d72d919cd2831f92c1050f014f0d3e042acc7a487e47a6d725e8a159b65
MD5 5a22e26b2cfb5d0cc71858fb9b9a37a3
BLAKE2b-256 5393acafec1bc5692a1f25037f7f9881f3bd801ff8f1e969fa0390368548a5b0

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5892c020b1937a232cf96fa8b897278655f1cb63b9ace0d6c0a193050b8ccd09
MD5 4bb73ff0073272fbbc7cdd9b13e0a648
BLAKE2b-256 b8ec3f1adc5bd9b80733ceed65383de54d6ea202653afe2361c24134f7dd8f8a

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 79ca84293f1869d74c1d3b1b38af1467c13b246380e20a49e7b33e6d1d0b31f3
MD5 0f030bdbcd1a60247b5fff60e80ef91b
BLAKE2b-256 659cc9848e830807632a6c99dcdc9b22a1ddea6f2dda4b098b33d1bf5912f1cc

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 07b9bb9742dcb8125974e3afc56807c19a68d2bd4c1c0cdb1820258123f1caf1
MD5 f33b1992c351cf31260ac5693a6ef1ca
BLAKE2b-256 3ed2daed0d4441b582513dba70bc7d5dd84f5d32dcf338ee644359227b2fcc3f

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp310-cp310-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp310-cp310-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 86d8536200b10d947859d02ca8386c69dfe60da0a2de725794ff10e98c8e675e
MD5 ba0184b19a32f7158b2a10e739b8e7b4
BLAKE2b-256 ae24863212e7536df2e7c58222b8053a0d7c7979c71ac80d2858013ae2790e1a

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 75c642e0a978149630aa0cb468475792ff459c8b476734a9717f4fab18dda37f
MD5 cc6e89a25c4d3108ab9ff3ad848f26c8
BLAKE2b-256 edae0d5d328f27d4caab59f15e1e14e3fc3c1c50008f1442e0b9e0e31ad4f3d2

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e8499081343a4bafa54f43a00a484ca9994b466085a64fcd48a60c1dbd7330a3
MD5 f6ebd1b351b06f5888f0666a0d8f67ae
BLAKE2b-256 77bdd603a17a1b0a681bd2ef9aa52bde6f29370096349f4493d2013928673a11

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 2beebbf1fc0be6c9220803064b4986d97799537c57dfdc2dd445712cdff8549d
MD5 afae2018db46f1f4ac6b96fcac4b2292
BLAKE2b-256 9aba4b5438fca185824eea042191e06e2baa430776363efeb3c272455dce8100

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp39-cp39-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp39-cp39-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3f3d13da1446212f5ac9e2ae9658ca044cfd373c0fa044d877bb92b413ab4e4b
MD5 ca8eb743b002a1e2c9dd6a05e4d69d86
BLAKE2b-256 d4883da6c535d330c7443f268af3214e6d84fb2de709846cd60df66ca8e0bc8f

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1bc911e9805104af589740ed9d735b25cbd2b46bb425c73b85123ab8ebb04b3d
MD5 343c2e648ed470e0501fba3883310a23
BLAKE2b-256 d0bbe533ca0dadb3a00b77251d6aaac1ec69235210368b44307b7562dd6b6613

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 8282272c8b046f024b195b873883cada29d8806744ea6edcbc703468b06d31ef
MD5 010eec99317e14f13d009ca838ea8b96
BLAKE2b-256 7b1cd99a67c27cd20f7c5511d1a517da3bbe3fa219f71adc2405494399fe60a9

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 9dab415bce61494874e1b93b9942c8269b0a776fee3d367e9967fb67cdf495cf
MD5 e363e0c27e923f771b7c7bf1a81b2f97
BLAKE2b-256 1b43d6fa5ad4ac7e1425f9fe036997feb92d3c3a0e4f21254ab6c926f8d63335

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp38-cp38-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp38-cp38-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b49f1d91dea06ff4cad7a4cd52ac8db648ee52c14a338fae436f8b2987e3c3ed
MD5 01dfd227e3e49ffba3b457f2de7f979a
BLAKE2b-256 0c7167a8b3b807784b7803efd2d0e06f7db407fb48862cd6cbe4d37136f884d6

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 02c42dcec6137e9a0cc235db06382c86dc6a5a80321ac453928bf29b305395ca
MD5 6820831b450e29e7f4829a7110819875
BLAKE2b-256 c11598f2d425303e11782b5ebf8079d8f5855f87a0cf559548cebf026ec1477d

See more details on using hashes here.

Provenance

File details

Details for the file torchtext-0.15.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for torchtext-0.15.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 32dadaea338d482abf11c73afb0295299683be571846709f61aa6aea1f7b1033
MD5 d88eb6e97b8e2bbc49d262f6d1403960
BLAKE2b-256 b4d46be50543eda01acaddf65ddea00d73bd73eb6b6c74da191ee60aa7c9dd86

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page