Skip to main content

Fast and Customizable Tokenizers

Project description



Build GitHub


Tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

Bindings over the Rust implementation. If you are interested in the High-level design, you can go check it there.

Otherwise, let's dive in!

Main features:

  • Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions).
  • Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU.
  • Easy to use, but also extremely versatile.
  • Designed for research and production.
  • Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token.
  • Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.

Installation

With pip:

pip install tokenizers

From sources:

To use this method, you need to have the Rust installed:

# Install with:
curl https://sh.rustup.rs -sSf | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"

Once Rust is installed, you can compile doing the following

git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python

# Create a virtual env (you can use yours as well)
python -m venv .env
source .env/bin/activate

# Install `tokenizers` in the current virtual env
pip install setuptools_rust
python setup.py install

Using the provided Tokenizers

Using a pre-trained tokenizer is really simple:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
vocab = "./path/to/vocab.json"
merges = "./path/to/merges.txt"
tokenizer = CharBPETokenizer(vocab, merges)

# And then encode:
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded.ids)
print(encoded.tokens)

And you can train yours just as simply:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
tokenizer = CharBPETokenizer()

# Then train it!
tokenizer.train([ "./path/to/files/1.txt", "./path/to/files/2.txt" ])

# And you can use it
encoded = tokenizer.encode("I can feel the magic, can you?")

# And finally save it somewhere
tokenizer.save("./path/to/directory", "my-bpe")

Provided Tokenizers

  • CharBPETokenizer: The original BPE
  • ByteLevelBPETokenizer: The byte level version of the BPE
  • SentencePieceBPETokenizer: A BPE implementation compatible with the one used by SentencePiece
  • BertWordPieceTokenizer: The famous Bert tokenizer, using WordPiece

All of these can be used and trained as explained above!

Build your own

You can also easily build your own tokenizers, by putting all the different parts you need together:

Use a pre-trained tokenizer

from tokenizers import Tokenizer, models, pre_tokenizers, decoders, processors

# Load a BPE Model
vocab = "./path/to/vocab.json"
merges = "./path/to/merges.txt"
bpe = models.BPE(vocab, merges)

# Initialize a tokenizer
tokenizer = Tokenizer(bpe)

# Customize pre-tokenization and decoding
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.ByteLevel(trim_offsets=True)

# And then encode:
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded.ids)
print(encoded.tokens)

# Or tokenize multiple sentences at once:
encoded = tokenizer.encode_batch([
	"I can feel the magic, can you?",
	"The quick brown fox jumps over the lazy dog"
])
print(encoded)

Train a new tokenizer

from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors

# Initialize a tokenizer
tokenizer = Tokenizer(models.BPE())

# Customize pre-tokenization and decoding
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.ByteLevel(trim_offsets=True)

# And then train
trainer = trainers.BpeTrainer(vocab_size=20000, min_frequency=2)
tokenizer.train(trainer, [
	"./path/to/dataset/1.txt",
	"./path/to/dataset/2.txt",
	"./path/to/dataset/3.txt"
])

# Now we can encode
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers-0.8.0.dev0.tar.gz (82.9 kB view details)

Uploaded Source

Built Distributions

tokenizers-0.8.0.dev0-cp38-cp38-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.8 Windows x86-64

tokenizers-0.8.0.dev0-cp38-cp38-win32.whl (981.9 kB view details)

Uploaded CPython 3.8 Windows x86

tokenizers-0.8.0.dev0-cp38-cp38-macosx_10_10_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.8 macOS 10.10+ x86-64

tokenizers-0.8.0.dev0-cp37-cp37m-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.7m Windows x86-64

tokenizers-0.8.0.dev0-cp37-cp37m-win32.whl (981.9 kB view details)

Uploaded CPython 3.7m Windows x86

tokenizers-0.8.0.dev0-cp37-cp37m-macosx_10_10_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.7m macOS 10.10+ x86-64

tokenizers-0.8.0.dev0-cp36-cp36m-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.6m Windows x86-64

tokenizers-0.8.0.dev0-cp36-cp36m-win32.whl (982.1 kB view details)

Uploaded CPython 3.6m Windows x86

tokenizers-0.8.0.dev0-cp36-cp36m-macosx_10_10_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.6m macOS 10.10+ x86-64

tokenizers-0.8.0.dev0-cp35-cp35m-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.5m Windows x86-64

tokenizers-0.8.0.dev0-cp35-cp35m-win32.whl (982.0 kB view details)

Uploaded CPython 3.5m Windows x86

tokenizers-0.8.0.dev0-cp35-cp35m-macosx_10_10_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.5m macOS 10.10+ x86-64

File details

Details for the file tokenizers-0.8.0.dev0.tar.gz.

File metadata

  • Download URL: tokenizers-0.8.0.dev0.tar.gz
  • Upload date:
  • Size: 82.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0.tar.gz
Algorithm Hash digest
SHA256 d82b704998cd98dec0d7482558737b9048154e552e6dca656e8220a56e0ab807
MD5 53a1b4d509a99004cb9e669d60fa53b9
BLAKE2b-256 cd83480ce497955f198719ac14b858a69dfe7623dd481856601c20d622fe5f88

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 c0e5e577f464cc8dcccbd83149d2dab066e635ca6f6031b552c9ac07aa0a7c86
MD5 c91f51759cad7ff8386fe9df18da34b3
BLAKE2b-256 dd34fb42b78adddffc9a213d27bba72529f08c996a21a583fb2f0147cd82fb0b

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp38-cp38-win32.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp38-cp38-win32.whl
  • Upload date:
  • Size: 981.9 kB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 9d7a21c1afe50831d1bb54115782bac82a354b743a90144ef1a6122d6cdd177f
MD5 5f51879ed9c372243a27ecb9f2a267d9
BLAKE2b-256 1010f18dcb02026e6ddb58dae30ccc520fb00ca588a139be8eb659912f0b064e

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 7.9 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4dbe2461957a571ff3f2ed3cd8710f3708d99b79896c48b7350ea0be60dedad0
MD5 d0019ba6eaedd4517b5c61a2ab72864a
BLAKE2b-256 d569c9cd1f8fc22ea83e2c2d354671549d0d39be271c0aeef428beac927754a6

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp38-cp38-macosx_10_10_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp38-cp38-macosx_10_10_x86_64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.8, macOS 10.10+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp38-cp38-macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 2872037f03bd0d1f8ff62681c3ca48a90c254920f529ae058e73ef382646e80a
MD5 34dcf9932ffd50b8089a1793a8689560
BLAKE2b-256 d507531837d240cedcf6f95250bb64213cdf77f80fcc7c5ee6caca89a81dbc88

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 48ab6c54d8eefba9eaef71f64ac3c24d4bbf615e0838b6cd6688f19706b58211
MD5 bd56405b26884c63d6cd59e6d1a9de5d
BLAKE2b-256 f24d8f02c55171b19f0f77124ff1027289f867c1d3f16107736a3151b9e900aa

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp37-cp37m-win32.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 981.9 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 146acddf03aabef5d9b5cf09619cd3b5b5aab4d1e44cb52ce00d2eded3a79fe4
MD5 a38c0f0788c282ec8d7b5af8f9bc98e4
BLAKE2b-256 1988a28a29aa445b326ba3a751102de0d70a849fc44c19b392d9d65a1d8ec7f6

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 5.9 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9cdbe7d989c153f3c7991408724e01741f8e9dfc8979b921238aa06f8e14739b
MD5 5886237f88b0bf52c996af747ad6a7d3
BLAKE2b-256 05dc88667cde89aec3575e0455f48431735e9620f03172e1e60d70863a7f2068

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp37-cp37m-macosx_10_10_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp37-cp37m-macosx_10_10_x86_64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.7m, macOS 10.10+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp37-cp37m-macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 283cf87f85da5f403672ff8de5c7c47ac2fe9a33b787e844380f5c2cd0303d40
MD5 1d7ad4f66ed884278c68919459640b20
BLAKE2b-256 a4a9b935490cb0f394675c7bbaef1d3cb51d71225eeeca8fd5b8e29b2d950a31

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 f2b6579aa3481d61a3f2217615cc8cade807a30c071077aa724ffc16dada0b50
MD5 5f76bbbb8bc6885b5b976d6d3455f940
BLAKE2b-256 9ec701190d65bbc43310afbf38b1c51e3605c8cf52314ba2bdff6885ea95b0a2

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp36-cp36m-win32.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 982.1 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 fcd2258a2f41d6a4578714a4a339fd5c1f906d92b3578f9c7e05fecc55c53ddf
MD5 00ea744bf0ef1dbd175fb2a1536c98ac
BLAKE2b-256 8dbc6f3891ed8ba87d97463a1ef73e2879f37c8b07e3785852fe87b0bad0a15c

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 4.0 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4d80123f2dca3632fce75193dcc1e7321269efafb94ad31cfc5802195c1914ec
MD5 112ecc8e16a3acf37f0f384152aee8b6
BLAKE2b-256 4990f410e7e161ed19c681fc1f217d3f91f6b93283f9fd6409379ee91b4a7517

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp36-cp36m-macosx_10_10_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp36-cp36m-macosx_10_10_x86_64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.6m, macOS 10.10+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp36-cp36m-macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 0d5cc5ac6a1d60a07b2de1275cfff72c0be43fb338ea0d0ba0086cce5287bc17
MD5 b3fee537fe47435d10bd3a1b3cc425db
BLAKE2b-256 24beebd283f2d192fb3963d0357dc56fe127dbe80f05aa68e739ceb01eb1cee8

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 09627e0e1008e36f4fc35f2736fccfa2a2490dfde30380f872d9c356403ab3cb
MD5 7b3be7a3b28e95b7f1d8f0584d2cd44c
BLAKE2b-256 0a6fc26380dd5cf8febf062f4b56ce6be53737ea180ccd6ca4a2131fccbea865

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp35-cp35m-win32.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 982.0 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 7a6a934d787425f257b0a2927c854ac451d46bb9799cb77cd3a9e380cb93145d
MD5 105d62e8765ab5e9cc30d0e3b8373957
BLAKE2b-256 b2bbc86600275aab3bc2bdf287e70cf5c8fcedc6ebed91a7f2557c304292b307

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5ad53dfdd5e37a91fd77deffb74a62c1d43ed0d4efe023a506f30197b7a3cff9
MD5 54faa44ba04836a7edf151f5f0033d6e
BLAKE2b-256 8141e6243c631d014bab6714348663a25ff41e69c2e7db7aba34e265cd729443

See more details on using hashes here.

File details

Details for the file tokenizers-0.8.0.dev0-cp35-cp35m-macosx_10_10_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.8.0.dev0-cp35-cp35m-macosx_10_10_x86_64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.5m, macOS 10.10+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for tokenizers-0.8.0.dev0-cp35-cp35m-macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 bc47f4b8097e13ba04a935687c54a9c98579292a9cc4361aaa775fc6a06809ab
MD5 f8c7ceddc1bd1fb809df888ac4319132
BLAKE2b-256 881a4c6d588b8aae056a37ea3c4a21f273507e516bb8056e2fcd5da2c69c2a14

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page