TextKNNClassifier is a k-nearest neighbors classifier for text data. It uses a compression algorithm to compute the distance between texts and predicts the label of a test entry based on the labels of the k-nearest neighbors in the training data.
Project description
TextKNNClassifier
TextKNNClassifier
is a k-nearest neighbors classifier for text data. It uses a compression algorithm to compute the distance between texts and predicts the label of a test entry based on the labels of the k-nearest neighbors in the training data.
Installation
You can install TextKNNassifier
using pip:
pip install textknnassifier
Usage
Here's an example of how to use TextKNNClassifier
:
from textknnassifier import classifier
training_text = [
"This is a test",
"Another test",
"General Tarkin",
"General Grievous",
]
training_labels = ["test", "test", "star_wars", "star_wars"]
testing_data = [
"This is a test",
"Testing here too!",
"General Kenobi",
"General Skywalker",
]
KNN = classifier.TextKNNClassifier(n_neighbors=2)
KNN.fit(training_data, training_labels)
predicted_labels = KNN.predict(testing_data)
print(predicted_labels)
# Output: ['test1', 'test1', 'star_wars', 'star_wars']
In this example, we create a TextKNNClassifier
instance and use it to predict the labels of the test entries. The initialization is given n_neighbors=2
, this denotes the number of training datapoints to consider for predicting the testing label. The fit
method takes two arguments: the training data, and the training labels. It simply stores these values for later use. The predict
method takes the testing data as an argument and returns the predicted labels.
References
- Jiang, Z., Yang, M., Tsirlin, M., Tang, R., Dai, Y., & Lin, J. (2023, July). “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. In Findings of the Association for Computational Linguistics: ACL 2023 (pp. 6810-6828).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for textknnassifier-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef0ae81f3cd2eca0b920ab15d3f49a6779cb758c168a8d7390a60a26bdb5a50c |
|
MD5 | 4358d0fad6f16a6789b385cf17b722a4 |
|
BLAKE2b-256 | 66e274c4957950df458f70ea31b709b883c367c68f184d0632ac42cef925992a |