A Python based License Classification and Copyright Statement Detection tool based on Google License Classifier
Project description
GoLicense-Classifier
A Python package to find license expressions and copyright statements in a codebase.
Based on Google LicenseClassifer V2, GoLicense-Classifier (or glc for short) focuses on performance without compromising with accuracy.
Installation
Note: Currently, this package only supports Linux Platform. Work is in progress for Windows and Mac.
Installing GoLicense-Classifier is as simple as
pip install golicense-classifier
Or, you can build the package from source as
git clone https://github.com/AvishrantsSh/GoLicense-Classifier.git
make dev
make package
Usage
To get started, import LicenseClassifier
class from the module as
from LicenseClassifier.classifier import LicenseClassifier
Note: Work on Copyright Statement is still in beta phase. Expect some issues, mostly with binary files
The class comes bundled with some handy functions, each suited for a different task.
-
scan_directory
This method is used to recursively walk through a directory and find license expressions and copyright statements. It returns a dictionary object with keys
header
andfiles
.Usage
classifier = LicenseClassifier() res = classifier.scan_directory('PATH_TO_DIR')
Optional Parameters
-
max_size
Maximum size of file in MB. Default is set to 10MB. Set
max_size < 0
to ignore size constraints -
use_buffer
(Experimental)
Set toTrue
to use buffered file scanning.max_size
will be used as buffer size. -
use_scancode_mapping
Set to
True
if you want to use Scancode license key mappings. Default is set toTrue
.
-
-
scan_file
This method is used to find license expressions and copyright statements in a single file.
Usage
classifier = LicenseClassifier() res = classifier.scan_file('PATH_TO_FILE')
Optional Parameters
-
max_size
Maximum size of file in MB. Default is set to 10MB. Set
max_size < 0
to ignore size constraints -
use_buffer
(Experimental)
Set toTrue
to use buffered file scanning.max_size
will be used as buffer size. -
use_scancode_mapping
Set to
True
if you want to use Scancode license key mappings. Default is set toTrue
.
-
Further Customization
You can set custom threshold for scanning purpose that best suits your need. Simply change the parameter threshold
during object creation as
classifier = LicenseClassifier(threshold = 0.9)
Contributing
Contributions are what makes the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
To get started, read the Contributing Guide.
References
-
Google LicenseClassfifer V2 https://github.com/google/licenseclassifier/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for golicense_classifier-0.0.16.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5f3f51291ae9e5c9974a1f44b13de20272619dd3b8a46b05fc170b73ec5d423 |
|
MD5 | 961408d1ac0e5588c65096db3224b58f |
|
BLAKE2b-256 | 0dc0d427d2a59462ec837bdf6c4ad3a7606f066c34eda9b5a334e58fa2dfdf44 |
Hashes for golicense_classifier-0.0.16-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52afd6b96e95b83e186ce15d38ffff79d38b32a5ddc690eab01a134b06553791 |
|
MD5 | 95cf8c6fdfa17ac7263cd826f46c39bf |
|
BLAKE2b-256 | db069aef911a93798466467285f848d4f5a90fb53178b752ad660957207002ac |