Machine Learning Toolbox
Project description
Machine Learning Toolbox 2 - MLTB2
📦 A box of machine learning tools. 📦
Components and Documentation
from mltb2.somajo import SoMaJoSentenceSplitter
Split texts into sentences. For German and English language.
This is done with the SoMaJo tool.
from mltb2.somajo import JaccardSimilarity
Calculate the jaccard similarity.
from mltb2.transformers import TransformersTokenCounter
Count tokens made by a Transformers tokenizer.
from mltb2.somajo_transformers import TextSplitter
Split the text into sections with a specified maximum token length.
Does not divide words, but always whole sentences.
from mltb2.optuna import SignificanceRepeatedTrainingPruner
An Optuna pruner
to use statistical significance (a t-test which serves as a heuristic) to stop
unpromising trials early, avoiding unnecessary repeated training during cross validation.
Installation
MLTB2 is available at the Python Package Index (PyPI). It can be installed with pip:
pip install mltb2
Some optional dependencies might be necessary. You can install all of them with:
pip install mltb2[optional]
If you don't want to install all dependencies, see the description of the individual modules.
Licensing
Copyright (c) 2023 Philip May
Copyright (c) 2023 Philip May, Deutsche Telekom AG
Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mltb2-0.0.7.tar.gz
.
File metadata
- Download URL: mltb2-0.0.7.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7efc82dc810c35ff22520511d41a64ae68a7e63a1d5fd2f818b6e8ec463886c4 |
|
MD5 | ee087a795fbad8d35522b46a1cff6fc1 |
|
BLAKE2b-256 | 1f0ce96a7c8a57ce545577066264501fc358c95004b4ba9465b43deb61a11434 |
File details
Details for the file mltb2-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: mltb2-0.0.7-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9654f39df2e251ff79c42ac3e537162b2e2715a939d824d65e8c212d95d7687 |
|
MD5 | ec6b6b0337b8c7811ec0c1543984c1bb |
|
BLAKE2b-256 | 32f7919d2a23b54660845bcef86d6e97a640a54218e90646760245db65eadd85 |