Skip to main content

A Python based License Classification and Copyright Statement Detection tool based on Google License Classifier

Project description

Golicense-Classifier

A Python based module to find valid copyright and license expressions in a file.

Note: This module is based on Google LicenseClassifier.

Installation

Currently, this package only supports Linux Platform. Work is in progress for Windows and Mac.

To install from Pypi, use

pip install golicense-classifier

Usage

To get started, import LicenseClassifier class from the module as

from LicenseClassifier.classifier import LicenseClassifier

Note: Work on Copyright Statement is still in progress. Expect some issues, mostly with binary files

The class comes bundled with several functions for scanning purpose.

  1. scan_directory

    This method is used to recursively walk through a directory and find license expressions and copyright statements. It returns a dictionary object with keys header and files.

    Usage


    classifier = LicenseClassifier()
    res = classifier.scan_directory('PATH_TO_DIR')
    

    Optional Parameters


    • max_size

      Maximum size of file in MB. Default is set to 10MB. Set max_size < 0 to ignore size constraints

    • use_buffer

      (Experimental) Set True to use buffered file scanning. max_size will be used as buffer size.

  2. scan_file

    This method is used to find license expressions and copyright statements on a single file.

    Usage


    classifier = LicenseClassifier()
    res = classifier.scan_file('PATH_TO_FILE')
    

    Optional Parameters


    • max_size

      Maximum size of file in MB. Default is set to 10MB. Set max_size < 0 to ignore size constraints

    • use_buffer

      (Experimental) Set True to use buffered file scanning. max_size will be used as buffer size.

Setting Custom Scanning Threshold

You can set custom threshold for scanning purpose that best suits your need. For this, you can use parameter threshold while making object as

classifier = LicenseClassifier(threshold = 0.9)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

golicense_classifier-0.0.15.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

golicense_classifier-0.0.15-py3-none-any.whl (2.4 MB view details)

Uploaded Python 3

File details

Details for the file golicense_classifier-0.0.15.tar.gz.

File metadata

  • Download URL: golicense_classifier-0.0.15.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.10

File hashes

Hashes for golicense_classifier-0.0.15.tar.gz
Algorithm Hash digest
SHA256 a44da84a52fee5f22ca3a203b254129b32d2b2a40fbf9909fa345f4654f0547a
MD5 cd67930713b1b89903ca4c86ed5b9df3
BLAKE2b-256 7480a95b4f8364cd79e4de343f7c5fda82e8cc058952126dbe2489e79f22a4c1

See more details on using hashes here.

Provenance

File details

Details for the file golicense_classifier-0.0.15-py3-none-any.whl.

File metadata

  • Download URL: golicense_classifier-0.0.15-py3-none-any.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.10

File hashes

Hashes for golicense_classifier-0.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 4c21cecee5855843f6c40ef59bfed0242ee518110be04339ba4c5f76e70c40ab
MD5 246d20271e9c306f30e56e553e4bcee5
BLAKE2b-256 bc601ed0a18a3f8ef0b315293fbfe6b6082edd3d444b11b750bb832d5aec2af1

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page