Skip to main content

This repository contains an easy and intuitive approach to zero and few-shot text classification using sentence-transformers, huggingface or word embeddings within a Spacy pipeline.

Project description

Classy few shot classification

This repository contains an easy and intuitive approach to zero and few-shot text classification using sentence-transformers, huggingface or word embeddings within a Spacy pipeline.

Why?

Huggingface does offer some nice models for few/zero-shot classification, but these are not tailored to multi-lingual approaches. Rasa NLU has a nice approach for this, but its too embedded in their codebase for easy usage outside of Rasa/chatbots. Additionally, it made sense to integrate sentence-transformers and Hugginface zero-shot, instead of default word embeddings. Finally, I decided to integrate with Spacy, since training a custom Spacy TextCategorizer seems like a lot of hassle if you want something quick and dirty.

Install

pip install classy-classification

Quickstart

Take a look at the examples directory. Use data from any language. And choose a model from sentence-transformers or from Hugginface zero-shot.

from classy_classification import classyClassifier

data = {
    "furniture": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa."],
    "kitchen": ["There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens."]
}


classifier = classyClassifier(data)
classifier("I am looking for kitchen appliances.")
classifier.pipe(["I am looking for kitchen appliances."])

import spacy

nlp = spacy.blank("en")

nlp.add_pipe("text_categorizer", config={"data": data}) # from sentence-transformers
nlp.add_pipe("text_categorizer", config={"data": data, "cat_type": "zero"}) # from huggingface zero-shot

nlp("I am looking for kitchen appliances.")._.cats
nlp.pipe(["I am looking for kitchen appliances."])

Credits

Inspiration Drawn From

Or buy me a coffee

"Buy Me A Coffee"

More examples

Some quick and dirty training data.

training_data = {
    "politics": [
        "Putin orders troops into pro-Russian regions of eastern Ukraine.",
        "The president decided not to go through with his speech.",
        "There is much uncertainty surrounding the coming elections.",
        "Democrats are engaged in a ‘new politics of evasion’."
    ],
    "sports": [
        "The soccer team lost.",
        "The team won by two against zero.",
        "I love all sport.",
        "The olympics were amazing.",
        "Yesterday, the tennis players wrapped up wimbledon."
    ],
    "weather": [
        "It is going to be sunny outside.",
        "Heavy rainfall and wind during the afternoon.",
        "Clear skies in the morning, but mist in the evenening.",
        "It is cold during the winter.",
        "There is going to be a storm with heavy rainfall."
    ]
}

validation_data = [
    "I am surely talking about politics.",
    "Sports is all you need.",
    "Weather is amazing."
]

using an individual sentence-transformer

from classy_classification import classyClassifier

classifier = classyClassifier(data=training_data)
classifier(validation_data[0])
classifier.pipe(validation_data)

# overwrite training data
classifier.set_training_data(data=new_training_data)

# overwrite [embedding model](https://www.sbert.net/docs/pretrained_models.html)
classifier.set_embedding_model(model="paraphrase-MiniLM-L3-v2")

# overwrite SVC config
classifier.set_svc(
    config={                              
        "C": [1, 2, 5, 10, 20, 100],
        "kernels": ["linear"],                              
        "max_cross_validation_folds": 5
    }
)

external sentence-transformer within spacy pipeline for few-shot

import spacy

import classy_classification

nlp = spacy.blank("en")
nlp.add_pipe("text_categorizer", config={"data": training_data}) #
nlp(validation_data[0])._.cats
nlp.pipe(validation_data)

external hugginface model within spacy pipeline for zero-shot

import spacy

import classy_classification

nlp = spacy.blank("en")
nlp.add_pipe("text_categorizer", config={"data": training_data, "cat_type": "zero"}) #
nlp(validation_data[0])._.cats
nlp.pipe(validation_data)

internal spacy word2vec embeddings

import spacy

import classy_classification

nlp = spacy.load("en_core_web_md") 
nlp.add_pipe("text_categorizer", config={"data": training_data, "model": "spacy"}) #use internal embeddings from spacy model
nlp(validation_data[0])._.cats
nlp.pipe(validation_data)

Todo

[ ] look into a way to integrate spacy trf models. [ ] multiple clasifications datasets for a single input e.g. emotions and topic.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

classy-classification-0.3.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

classy_classification-0.3-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file classy-classification-0.3.tar.gz.

File metadata

  • Download URL: classy-classification-0.3.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.8.2 Windows/10

File hashes

Hashes for classy-classification-0.3.tar.gz
Algorithm Hash digest
SHA256 cc60bfc12e4aa081119f163c70a1cbce527b4196b32c282cc5d107fc9da8af4b
MD5 5cfef8de1b2ce29182faf245a9abcc8a
BLAKE2b-256 5f70c5ea6ca85c274520d73a9a27c537919f43cbb757517a4cddcc57760a1957

See more details on using hashes here.

File details

Details for the file classy_classification-0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for classy_classification-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 918f0b60cd762279e7bc3979d94e9028d194879713c2f1760e77740e9175b046
MD5 a9bc2e58ed3509b742efcedb978d4186
BLAKE2b-256 95e65f1532141eb3ee5c810eb06a277f5bf1b5f942776fc743e1bf324de733fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page