Skip to main content

A Python based License Classification and Copyright Statement Detection tool based on Google License Classifier

Project description

GoLicense-Classifier

PyPI Shield MIT License Code Format


A Python package to find license expressions and copyright statements in a codebase.

Based on Google LicenseClassifer V2, GoLicense-Classifier (or glc for short) focuses on performance without compromising with accuracy.

Installation

Note: Currently, this package only supports Linux Platform. Work is in progress for Windows and Mac.

Installing GoLicense-Classifier is as simple as

pip install golicense-classifier

Or, you can build the package from source as

git clone https://github.com/AvishrantsSh/GoLicense-Classifier.git
make dev
make package

Usage

To get started, import LicenseClassifier class from the module as

from LicenseClassifier.classifier import LicenseClassifier

Note: Work on Copyright Statement is still in beta phase. Expect some issues, mostly with binary files

The class comes bundled with some handy functions, each suited for a different task.

  1. scan_directory

    This method is used to recursively walk through a directory and find license expressions and copyright statements. It returns a dictionary object with keys header and files.

    Usage


    classifier = LicenseClassifier()
    res = classifier.scan_directory('PATH_TO_DIR')
    

    Optional Parameters


    • max_size

      Maximum size of file in MB. Default is set to 10MB. Set max_size < 0 to ignore size constraints

    • use_buffer

      (Experimental) Set to True to use buffered file scanning. max_size will be used as buffer size.

    • use_scancode_mapping

      Set to True if you want to use Scancode license key mappings. Default is set to True.

  2. scan_file

    This method is used to find license expressions and copyright statements in a single file.

    Usage


    classifier = LicenseClassifier()
    res = classifier.scan_file('PATH_TO_FILE')
    

    Optional Parameters


    • max_size

      Maximum size of file in MB. Default is set to 10MB. Set max_size < 0 to ignore size constraints

    • use_buffer

      (Experimental) Set to True to use buffered file scanning. max_size will be used as buffer size.

    • use_scancode_mapping

      Set to True if you want to use Scancode license key mappings. Default is set to True.

Further Customization

You can set custom threshold for scanning purpose that best suits your need. Simply change the parameter threshold during object creation as

classifier = LicenseClassifier(threshold = 0.9)

Contributing

Contributions are what makes the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

To get started, read the Contributing Guide.

References

  1. Google LicenseClassfifer V2 https://github.com/google/licenseclassifier/

  2. Ctypes https://docs.python.org/3/library/ctypes.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

golicense_classifier-0.0.16.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

golicense_classifier-0.0.16-py3-none-any.whl (2.4 MB view details)

Uploaded Python 3

File details

Details for the file golicense_classifier-0.0.16.tar.gz.

File metadata

  • Download URL: golicense_classifier-0.0.16.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for golicense_classifier-0.0.16.tar.gz
Algorithm Hash digest
SHA256 f5f3f51291ae9e5c9974a1f44b13de20272619dd3b8a46b05fc170b73ec5d423
MD5 961408d1ac0e5588c65096db3224b58f
BLAKE2b-256 0dc0d427d2a59462ec837bdf6c4ad3a7606f066c34eda9b5a334e58fa2dfdf44

See more details on using hashes here.

File details

Details for the file golicense_classifier-0.0.16-py3-none-any.whl.

File metadata

  • Download URL: golicense_classifier-0.0.16-py3-none-any.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for golicense_classifier-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 52afd6b96e95b83e186ce15d38ffff79d38b32a5ddc690eab01a134b06553791
MD5 95cf8c6fdfa17ac7263cd826f46c39bf
BLAKE2b-256 db069aef911a93798466467285f848d4f5a90fb53178b752ad660957207002ac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page