A Python based License Classification and Copyright Statement Detection tool based on Google License Classifier
Project description
Golicense-Classifier
A Python based module to find valid copyright and license expressions in a file.
Note: This module is based on Google LicenseClassifier.
Installation
Currently, this package only supports Linux Platform. Work is in progress for Windows and Mac.
To install from Pypi, use
pip install golicense-classifier
Usage
To get started, import LicenseClassifier
class from the module as
from LicenseClassifier.classifier import LicenseClassifier
Note: Work on Copyright Statement is still in progress. Expect some issues, mostly with binary files
The class comes bundled with several functions for scanning purpose.
-
scan_directory
This method is used to recursively walk through a directory and find license expressions and copyright statements. It returns a dictionary object with keys
header
andfiles
.Usage
classifier = LicenseClassifier() res = classifier.scan_directory('PATH_TO_DIR')
Optional Parameters
-
max_size
Maximum size of file in MB. Default is set to 10MB. Set
max_size < 0
to ignore size constraints -
use_buffer
(Experimental)
SetTrue
to use buffered file scanning.max_size
will be used as buffer size.
-
-
scan_file
This method is used to find license expressions and copyright statements on a single file.
Usage
classifier = LicenseClassifier() res = classifier.scan_file('PATH_TO_FILE')
Optional Parameters
-
max_size
Maximum size of file in MB. Default is set to 10MB. Set
max_size < 0
to ignore size constraints -
use_buffer
(Experimental)
SetTrue
to use buffered file scanning.max_size
will be used as buffer size.
-
Setting Custom Scanning Threshold
You can set custom threshold for scanning purpose that best suits your need. For this, you can use parameter threshold
while making object as
classifier = LicenseClassifier(threshold = 0.9)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file golicense_classifier-0.0.15.tar.gz
.
File metadata
- Download URL: golicense_classifier-0.0.15.tar.gz
- Upload date:
- Size: 2.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a44da84a52fee5f22ca3a203b254129b32d2b2a40fbf9909fa345f4654f0547a |
|
MD5 | cd67930713b1b89903ca4c86ed5b9df3 |
|
BLAKE2b-256 | 7480a95b4f8364cd79e4de343f7c5fda82e8cc058952126dbe2489e79f22a4c1 |
Provenance
File details
Details for the file golicense_classifier-0.0.15-py3-none-any.whl
.
File metadata
- Download URL: golicense_classifier-0.0.15-py3-none-any.whl
- Upload date:
- Size: 2.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c21cecee5855843f6c40ef59bfed0242ee518110be04339ba4c5f76e70c40ab |
|
MD5 | 246d20271e9c306f30e56e553e4bcee5 |
|
BLAKE2b-256 | bc601ed0a18a3f8ef0b315293fbfe6b6082edd3d444b11b750bb832d5aec2af1 |