Skip to main content

TextKNNClassifier is a k-nearest neighbors classifier for text data. It uses a compression algorithm to compute the distance between texts and predicts the label of a test entry based on the labels of the k-nearest neighbors in the training data.

Project description

TextKNNClassifier

Build codecov Code style: black L-GPL License pages

TextKNNClassifier is a k-nearest neighbors classifier for text data. It uses a compression algorithm to compute the distance between texts and predicts the label of a test entry based on the labels of the k-nearest neighbors in the training data.

Installation

You can install TextKNNassifier using pip:

pip install textknnassifier

Usage

Here's an example of how to use TextKNNClassifier:

from textknnassifier import classifier

training_text = [
    "This is a test",
    "Another test",
    "General Tarkin",
    "General Grievous",
]
training_labels = ["test", "test", "star_wars", "star_wars"]
testing_data = [
    "This is a test",
    "Testing here too!",
    "General Kenobi",
    "General Skywalker",
]

KNN = classifier.TextKNNClassifier(n_neighbors=2)
KNN.fit(training_data, training_labels)
predicted_labels = KNN.predict(testing_data)

print(predicted_labels)
# Output: ['test1', 'test1', 'star_wars', 'star_wars']

In this example, we create a TextKNNClassifier instance and use it to predict the labels of the test entries. The initialization is given n_neighbors=2, this denotes the number of training datapoints to consider for predicting the testing label. The fit method takes two arguments: the training data, and the training labels. It simply stores these values for later use. The predict method takes the testing data as an argument and returns the predicted labels.

References

  • Jiang, Z., Yang, M., Tsirlin, M., Tang, R., Dai, Y., & Lin, J. (2023, July). “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. In Findings of the Association for Computational Linguistics: ACL 2023 (pp. 6810-6828).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textknnassifier-0.0.1rc1.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

textknnassifier-0.0.1rc1-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file textknnassifier-0.0.1rc1.tar.gz.

File metadata

  • Download URL: textknnassifier-0.0.1rc1.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for textknnassifier-0.0.1rc1.tar.gz
Algorithm Hash digest
SHA256 b63abdd38dcedd76bec5e90911eb6037fee2bd4c7eecc0f8286027e5aa302b46
MD5 93bb2324d298f7d43e73eb5f2cdcc228
BLAKE2b-256 6c0c3d4f589696e7e8f2951d44ef64a57c4016a3e185662099759ee1fdcdf547

See more details on using hashes here.

Provenance

File details

Details for the file textknnassifier-0.0.1rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for textknnassifier-0.0.1rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 b679b6f9e368c7029d54f89e28d1f86529a198114b762626d60dfe3304e33de2
MD5 105f7c373608f2dd59a2948e4c4bedc1
BLAKE2b-256 73aefa26d8e94c4b3e92762f0986216467ec2da4dd116823ad4553cf4eabc99b

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page