These details have not been verified by PyPI

Project links

Project description

Classy Classification

Have you every struggled with needing a Spacy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go! For few-shot classification using sentence-transformers or spaCy models, provide a dictionary with labels and examples, or just provide a list of labels for zero shot-classification with Hugginface zero-shot classifiers.

Install

pip install classy-classification

Quickstart

import classy_classification


data = {
    "furniture": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa."],
    "kitchen": ["There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens."]
}

classification_type = "spacy_few_shot"

# use internal spacy embeddings with a few examples per label
if classification_type == "spacy_few_shot":
    nlp = spacy.blank("en")
    nlp.add_pipe("text_categorizer", 
        config={"data": data, "model": "spacy"}
    ) 
# use sentence-transformer embeddings with a few examples per label
elif classification_type == "sentence_transformer_few_shot":
    nlp = spacy.blank("en")
    nlp.add_pipe("text_categorizer", 
        config={"data": data, "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"}
    ) 
# use zero-shot classification with only a few labels
elif classification_type == "huggingface_zero_shot":
    nlp = spacy.blank("en")
    nlp.add_pipe("text_categorizer", 
        config={"data": ["furniture", "kitchen"], "cat_type": "zero", "model": "facebook/bart-large-mnli"}
    )

print(nlp(\"I am looking for kitchen appliances.\")._.cats)
# Output:
#
# [{"label": "furniture", "score": 0.21}, {"label": "kitchen", "score": 0.79}]

Credits

Inspiration Drawn From

Huggingface does offer some nice models for few/zero-shot classification, but these are not tailored to multi-lingual approaches. Rasa NLU has a nice approach for this, but its too embedded in their codebase for easy usage outside of Rasa/chatbots. Additionally, it made sense to integrate sentence-transformers and Hugginface zero-shot, instead of default word embeddings. Finally, I decided to integrate with Spacy, since training a custom Spacy TextCategorizer seems like a lot of hassle if you want something quick and dirty.

Or buy me a coffee

More examples

Some quick and dirty training data.

training_data = {
    "politics": [
        "Putin orders troops into pro-Russian regions of eastern Ukraine.",
        "The president decided not to go through with his speech.",
        "There is much uncertainty surrounding the coming elections.",
        "Democrats are engaged in a ‘new politics of evasion’."
    ],
    "sports": [
        "The soccer team lost.",
        "The team won by two against zero.",
        "I love all sport.",
        "The olympics were amazing.",
        "Yesterday, the tennis players wrapped up wimbledon."
    ],
    "weather": [
        "It is going to be sunny outside.",
        "Heavy rainfall and wind during the afternoon.",
        "Clear skies in the morning, but mist in the evenening.",
        "It is cold during the winter.",
        "There is going to be a storm with heavy rainfall."
    ]
}

validation_data = [
    "I am surely talking about politics.",
    "Sports is all you need.",
    "Weather is amazing."
]

internal spacy word2vec embeddings

import spacy
import classy_classification

nlp = spacy.load("en_core_web_md") 
nlp.add_pipe("text_categorizer", config={"data": training_data, "model": "spacy"}) #use internal embeddings from spacy model
print(nlp(validation_data[0])._.cats)
print([doc._.cats for doc in nlp.pipe(validation_data)])

using as an individual sentence-transformer

from classy_classification import classyClassifier

classifier = classyClassifier(data=training_data)
classifier(validation_data[0])
classifier.pipe(validation_data)

# overwrite training data
classifier.set_training_data(data=new_training_data)

# overwrite [embedding model](https://www.sbert.net/docs/pretrained_models.html)
classifier.set_embedding_model(model="paraphrase-MiniLM-L3-v2")

# overwrite SVC config
classifier.set_svc(
    config={                              
        "C": [1, 2, 5, 10, 20, 100],
        "kernels": ["linear"],                              
        "max_cross_validation_folds": 5
    }
)

external sentence-transformer within spacy pipeline for few-shot

import spacy
import classy_classification

nlp = spacy.blank("en")
nlp.add_pipe("text_categorizer", config={"data": training_data}) #
print(nlp(validation_data[0])._.cats)
print([doc._.cats for doc in nlp.pipe(validation_data)])

external hugginface model within spacy pipeline for zero-shot

import spacy
import classy_classification

nlp = spacy.blank("en")
nlp.add_pipe("text_categorizer", config={"data": training_data, "cat_type": "zero"}) #
print(nlp(validation_data[0])._.cats)
print([doc._.cats for doc in nlp.pipe(validation_data)])

Todo

[ ] look into a way to integrate spacy trf models.

[ ] multiple clasifications datasets for a single input e.g. emotions and topic.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

May 31, 2024

0.6.7

Aug 31, 2023

0.6.6

Jun 19, 2023

0.6.5

Jun 19, 2023

0.6.4

Jun 18, 2023

0.6.3

Jun 18, 2023

0.6.2

Feb 15, 2023

0.6.1

Jan 14, 2023

0.6

Dec 30, 2022

0.5.4

Dec 24, 2022

0.5.3.1

Dec 15, 2022

0.5.3

Dec 15, 2022

0.5.2

Nov 21, 2022

0.5.1

Nov 21, 2022

0.5

Nov 11, 2022

0.4.5

Sep 25, 2022

0.4.4

May 27, 2022

0.4.2

May 19, 2022

0.4.1

Apr 13, 2022

0.4.0

Apr 3, 2022

0.3.6

Mar 29, 2022

0.3.5

Mar 13, 2022

0.3.4

Mar 8, 2022

This version

0.3.3

Mar 8, 2022

0.3.2

Feb 28, 2022

0.3.1

Feb 24, 2022

0.3

Feb 24, 2022

0.2.3

Feb 22, 2022

0.2.2

Feb 22, 2022

0.2.1

Feb 22, 2022

0.1.0

Feb 22, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

classy-classification-0.3.3.tar.gz (9.4 kB view details)

Uploaded Mar 8, 2022 Source

Built Distribution

classy_classification-0.3.3-py3-none-any.whl (14.0 kB view details)

Uploaded Mar 8, 2022 Python 3

File details

Details for the file classy-classification-0.3.3.tar.gz.

File metadata

Download URL: classy-classification-0.3.3.tar.gz
Upload date: Mar 8, 2022
Size: 9.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.11 CPython/3.8.2 Windows/10

File hashes

Hashes for classy-classification-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`400a8c4f68809ec7ae988601837851762be824240f73ac4b20eed70e587e2fef`
MD5	`2cd6df1205fcf2e2f8610301d4e040b8`
BLAKE2b-256	`9bc80e34211117be4dc4f59e40225badba7c30a91912ba483b6b30e9e44dd8de`

See more details on using hashes here.

File details

Details for the file classy_classification-0.3.3-py3-none-any.whl.

File metadata

Download URL: classy_classification-0.3.3-py3-none-any.whl
Upload date: Mar 8, 2022
Size: 14.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.11 CPython/3.8.2 Windows/10

File hashes

Hashes for classy_classification-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`49a11f0ad56a82a848d77861693519a3a14c20d721f4d1e042626fe681520165`
MD5	`514f21deddc84b172ba2d2988b33fefc`
BLAKE2b-256	`7aab26936b3b5728059335a1e619c93ce81cbc4ef6d15fabb56ff4bfb036cf53`