Skip to main content

This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with entity confidence scores!

Project description

Concise Concepts

When wanting to apply NER to concise concepts, it is really easy to come up with examples, but pretty difficult to train an entire pipeline. Concise Concepts uses few-shot NER based on word embedding similarity to get you going with easy! Now with entity scoring!

Python package Current Release Version pypi Version PyPi downloads Code style: black

Install

pip install concise-concepts

Quickstart

import spacy
from spacy import displacy
import concise_concepts

data = {
    "fruit": ["apple", "pear", "orange"],
    "vegetable": ["broccoli", "spinach", "tomato"],
    "meat": ["beef", "pork", "fish", "lamb"]
}

text = """
    Heat the oil in a large pan and add the Onion, celery and carrots. 
    Then, cook over a medium–low heat for 10 minutes, or until softened. 
    Add the courgette, garlic, red peppers and oregano and cook for 2–3 minutes.
    Later, add some oranges and chickens. """

nlp = spacy.load("en_core_web_lg", disable=["ner"])
# ent_score for entity condifence scoring
nlp.add_pipe("concise_concepts", config={"data": data, "ent_score": True})
doc = nlp(text)

options = {"colors": {"fruit": "darkorange", "vegetable": "limegreen", "meat": "salmon"},
           "ents": ["fruit", "vegetable", "meat"]}

ents = doc.ents
for ent in ents:
    new_label = f"{ent.label_} ({float(ent._.ent_score):.0%})"
    options["colors"][new_label] = options["colors"].get(ent.label_.lower(), None)
    options["ents"].append(new_label)
    ent.label_ = new_label
doc.ents = ents

displacy.render(doc, style="ent", options=options)

use specific number of words to expand over

data = {
    "fruit": ["apple", "pear", "orange"],
    "vegetable": ["broccoli", "spinach", "tomato"],
    "meat": ["beef", "pork", "fish", "lamb"]
}

topn = [50, 50, 150]

assert len(topn) == len

nlp.add_pipe("concise_concepts", config={"data": data, "topn": topn})

use word similarity to score entities

import spacy
import concise_concepts

data = {
    "ORG": ["Google", "Apple", "Amazon"],
    "GPE": ["Netherlands", "France", "China"],
}

text = """Sony was founded in Japan."""

nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("concise_concepts", config={"data": data, "ent_score": True})
doc = nlp(text)

print([(ent.text, ent.label_, ent._.ent_score) for ent in doc.ents])
# output
#
# [('Sony', 'ORG', 0.63740385), ('Japan', 'GPE', 0.5896993)]

use gensim.word2vec model from pre-trained gensim or custom model path

data = {
    "fruit": ["apple", "pear", "orange"],
    "vegetable": ["broccoli", "spinach", "tomato"],
    "meat": ["beef", "pork", "fish", "lamb"]
}

# model from https://radimrehurek.com/gensim/downloader.html or path to local file
model_path = "glove-twitter-25"

nlp.add_pipe("concise_concepts", config={"data": data, "model_path": model_path})

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

concise-concepts-0.5.1.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

concise_concepts-0.5.1-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file concise-concepts-0.5.1.tar.gz.

File metadata

  • Download URL: concise-concepts-0.5.1.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.8.2 Windows/10

File hashes

Hashes for concise-concepts-0.5.1.tar.gz
Algorithm Hash digest
SHA256 29604b51eafe6dea778604a32c59d1914f628028dc1e65d19fcd35057b6d32ea
MD5 bf304c9170b60ceb9b19d2a7f1020c38
BLAKE2b-256 c423a140d858fa4442ae0b48a67355919dccbeee896a12fdd0b36c45fa305bc4

See more details on using hashes here.

File details

Details for the file concise_concepts-0.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for concise_concepts-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0d3ceea337109310d64004c15d9f1a07d85b9adda519f9cab2cab501262ab7c9
MD5 5c7c8deb7f35e05060c3a5d14eccd83d
BLAKE2b-256 6663a5ba224015ce59f919dd5d175c0469cc551f987e4b077ede2a2c4b0729f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page