This repository contains an easy and intuitive approach to few-shot text classification using sentence-transformers or spacy embeddings.
Project description
Classy few shot classification
This repository contains an easy and intuitive approach to few-shot text classification using sentence-transformers or spacy embeddings.
Why?
Huggingface does offer some nice models for few/zero-shot classification, but these are not tailored to multi-lingual approaches. Rasa NLU has a nice approach for this, but its too embedded in their codebase for easy usage outside of Rasa/chatbots. Additionally, it made sense to integrate sentence-transformers, instead of default word embeddings. Finally, I decided to integrate with Spacy, since training a custom Spacy TextCategorizer seems like a lot of hassle if you want something quick and dirty.
Install
pip install classy-classification
Quickstart
Take a look at the examples directory. Use data from any language. And choose a model from sentence-transformers.
from classy_classification import classyClassifier
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
classifier = classyClassifier(data)
classifier("I am looking for kitchen appliances.")
Credits
Inspiration Drawn From
Or buy me a coffee
More examples
Some quick and dirty training data.
training_data = {
"politics": [
"Putin orders troops into pro-Russian regions of eastern Ukraine.",
"The president decided not to go through with his speech.",
"There is much uncertainty surrounding the coming elections.",
"Democrats are engaged in a ‘new politics of evasion’."
],
"sports": [
"The soccer team lost.",
"The team won by two against zero.",
"I love all sport.",
"The olympics were amazing.",
"Yesterday, the tennis players wrapped up wimbledon."
],
"weather": [
"It is going to be sunny outside.",
"Heavy rainfall and wind during the afternoon.",
"Clear skies in the morning, but mist in the evenening.",
"It is cold during the winter.",
"There is going to be a storm with heavy rainfall."
]
}
validation_data = [
"I am surely talking about politics.",
"Sports is all you need.",
"Weather is amazing."
]
using an individual sentence-transformer
from classy_classification import classyClassifier
classifier = classyClassifier(data=training_data)
classifier(validation_data[0])
classifier.pipe(validation_data)
# overwrite training data
classifier.set_training_data(data=new_training_data)
# overwrite [embedding model](https://www.sbert.net/docs/pretrained_models.html)
classifier.set_embedding_model(model="paraphrase-MiniLM-L3-v2")
# overwrite SVC config
classifier.set_svc(
config={
"C": [1, 2, 5, 10, 20, 100],
"kernels": ["linear"],
"max_cross_validation_folds": 5
}
)
external sentence-transformer within spacy pipeline
import spacy
import classy_classification
nlp = spacy.blank("en")
nlp.add_pipe("text_categorizer", config={"data": training_data}) # provide similar config as above
nlp(validation_data[0])._.cats
nlp.pipe(validation_data)
internal spacy word2vec embeddings
import spacy
import classy_classification
nlp = spacy.load("en_core_web_md")
nlp.add_pipe("text_categorizer", config={"data": training_data, "model": "spacy"}) #use internal embeddings from spacy model
nlp(validation_data[0])._.cats
nlp.pipe(validation_data)
Todo
[ ] look into a way to integrate spacy trf models. [ ] multiple clasifications datasets for a single input e.g. emotions and topic.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file classy-classification-0.2.3.tar.gz
.
File metadata
- Download URL: classy-classification-0.2.3.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.2 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1ceffd78687ae3bfe2387c4cc3edbda2798b84b31179c2c492788b9e592b8a1 |
|
MD5 | 10cb3cadec4d1f431f14914fc93911fa |
|
BLAKE2b-256 | 15ba0d4fd0249028d383e5f7948c33a9f3e342ce426c4e9c9a8e231e8e2b7789 |
File details
Details for the file classy_classification-0.2.3-py3-none-any.whl
.
File metadata
- Download URL: classy_classification-0.2.3-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.2 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccbab8b06ec73184021e79e0d7f4c79171ab82f043a01a85b88b1b1bd07f6aa3 |
|
MD5 | 8a31380f7d7fdf67da36553f5d049c2c |
|
BLAKE2b-256 | c40bca0505aa0c4700cc509f681161a73a8daf376a96e400b854221f87cf4dc4 |