Skip to main content

This repository contains an easy and intuitive approach to few-shot text classification using sentence-transformers or spacy embeddings.

Project description

Classy few shot classification

This repository contains an easy and intuitive approach to few-shot text classification using sentence-transformers or spacy embeddings.

Why?

Huggingface does offer some nice models for few/zero-shot classification, but these are not tailored to multi-lingual approaches. Rasa NLU has a nice approach for this, but its too embedded in their codebase for easy usage outside of Rasa/chatbots. Additionally, it made sense to integrate sentence-transformers, instead of default word embeddings. Finally, I decided to integrate with Spacy, since training a custom Spacy TextCategorizer seems like a lot of hassle if you want something quick and dirty.

Install

pip install classy-classification

Quickstart

Take a look at the examples directory.

Some quick and dirty training data.

training_data = {
    "politics": [
        "Putin orders troops into pro-Russian regions of eastern Ukraine.",
        "The president decided not to go through with his speech.",
        "There is much uncertainty surrounding the coming elections.",
        "Democrats are engaged in a ‘new politics of evasion’"
    ],
    "sports": [
        "The soccer team lost.",
        "The team won by two against zero.",
        "I love all sport.",
        "The olympics were amazing.",
        "Yesterday, the tennis players wrapped up wimbledon."
    ],
    "weather": [
        "It is going to be sunny outside.",
        "Heavy rainfall and wind during the afternoon.",
        "Clear skies in the morning, but mist in the evenening.",
        "It is cold during the winter.",
        "There is going to be a storm with heavy rainfall."
    ]
}

validation_data = [
    "I am surely talking about politics.",
    "Sports is all you need.",
    "Weather is amazing."
]

using an individual sentence-transformer

from classy_classification import classyClassifier

classifier = classyClassifier(data=training_data)
classifier(validation_data[0])
classifier.pipe(validation_data)

# overwrite training data
classifier.set_training_data(data=new_training_data)

# overwrite [embedding model](https://www.sbert.net/docs/pretrained_models.html)
classifier.set_embedding_model(model="paraphrase-MiniLM-L3-v2")

# overwrite SVC config
classifier.set_svc(
    config={                              
        "C": [1, 2, 5, 10, 20, 100],
        "kernels": ["linear"],                              
        "max_cross_validation_folds": 5
    }
)

external sentence-transformer within spacy pipeline

import spacy

import classy_classification

nlp = spacy.blank("en")
nlp.add_pipe("text_categorizer", config={"data": training_data}) # provide similar config as above
nlp(validation_data[0])._.cats
nlp.pipe(validation_data)

internal spacy word2vec embeddings

import spacy

import classy_classification

nlp = spacy.load("en_core_web_md") 
nlp.add_pipe("text_categorizer", config={"data": training_data, "model": "spacy"}) #use internal embeddings from spacy model
nlp(validation_data[0])._.cats
nlp.pipe(validation_data)

Todo

[ ] look into a way to integrate spacy trf models.

Inspiration Drawn From

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

classy-classification-0.2.2.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

classy_classification-0.2.2-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file classy-classification-0.2.2.tar.gz.

File metadata

  • Download URL: classy-classification-0.2.2.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.8.2 Windows/10

File hashes

Hashes for classy-classification-0.2.2.tar.gz
Algorithm Hash digest
SHA256 473fbe3acda466b246112aee72e84f6520cedbbd94bc8fd4cba30a25080d541f
MD5 88ab79869ac75a00c4c7d618647439ae
BLAKE2b-256 cc23e0d93c8a065719f61389ad54cf45fecb3060c5ce90e11d6ce356da6cb0a5

See more details on using hashes here.

File details

Details for the file classy_classification-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for classy_classification-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f8eee1ccc75f125ebbada1a11fef0f3015055b76a34ad14f45835e370d17ffa7
MD5 e3f62a5332e6a97e64614fe52aa519b6
BLAKE2b-256 774f08b2c4187f982bfc38c42a0380ce746974ebc4962c53a86fb0c4f3429bc9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page