Skip to main content

ScanCode is a tool to scan code for license, copyright, package and their documented dependencies and other interesting facts.

Project description

Build and tests status

Branch

Coverage

Linux (Travis)

MacOSX (Travis)

Windows (AppVeyor)

Master

Linux Master branch test coverage Linux Master branch tests status MacOSX Master branch tests status Windows Master branch tests status

Develop

Linux Develop branch test coverage Linux Develop branch tests status MacOSX Develop branch tests status Windows Develop branch tests status

ScanCode is a suite of utilities used to scan a codebase for license, copyright, packages dependencies and other interesting information that can be discovered in source and binary code files.

A typical software project often reuses hundreds of third-party components. License and origin information is often scattered and not easy to find: ScanCode discovers this data for you.

ScanCode provides accurate scan results and the line position where each result is found. The results can be formatted as JSON or HTML. ScanCode provides a simple HTML app for quick visualization of scan results (see screenshot below), but you will have more robust analysis options if you use AboutCode Manager to view a scan. AboutCode Manager is a desktop application available or Linux, OSX or Windows - go to https://github.com/nexB/aboutcode-manager to learn more or to download AboutCode Manager.

We are continuously working on new features, such as analysis of dependencies or improving performance for scanning of larger codebases.

See the roadmap for upcoming features: https://github.com/nexB/scancode-toolkit/wiki/Roadmap

samples/screenshot.png

Quick Start

For Windows, please go to the Comprehensive Installation section instead.

Make sure you have Python 2.7 installed:

On Linux install Python 2.7 “devel” and a few extra packages:

  • sudo apt-get install python-dev bzip2 xz-utils zlib1g libxml2-dev libxslt1-dev for Ubuntu 12.04, 14.04 and 16.04

  • sudo apt-get install python-dev libbz2-1.0 xz-utils zlib1g libxml2-dev libxslt1-dev for Debian and Debian-based distros

  • sudo yum install python-devel zlib bzip2-libs xz-libs libxml2-devel libxslt-devel for RPM distros

  • sudo dnf install python-devel zlib bzip2-libs xz-libs libxml2-devel libxslt-devel for Fedora 22 and later

  • See the Comprehensive Installation for additional details and other Linux installations: https://github.com/nexB/scancode-toolkit/wiki/Comprehensive-Installation

Next, download and extract the latest ScanCode release from:

https://github.com/nexB/scancode-toolkit/releases/

Open a terminal, extract the downloaded release archive, then cd to the extracted directory and run this command to display the command help. ScanCode will self-configure if needed:

./scancode --help

Run a sample scan saved to the samples.html file:

./scancode --format html-app samples samples.html

Then open samples.html in your web browser to view the scan results.

See more command examples:

./scancode --examples

Support

If you have a problem, a suggestion or found a bug, please enter a ticket at: https://github.com/nexB/scancode-toolkit/issues

For other questions, discussions, and chats, we have:

About archives

All code must be extracted before running ScanCode since ScanCode will not extract files from tarballs, zip files, etc. However, the ScanCode Toolkit comes with a utility called extractcode that does recursive archive extraction. For example, this command will recursively extract the mytar.tar.bz2 tarball in the mytar.tar.bz2-extract directory:

./extractcode mytar.tar.bz2

Source code

License

  • Apache-2.0 with an acknowledgement required to accompany the scan output.

  • Public domain CC-0 for reference datasets.

  • Multiple licenses (GPL2/3, LGPL, MIT, BSD, etc.) for third-party components.

See the NOTICE file for more details.

Documentation & FAQ

https://github.com/nexB/scancode-toolkit/wiki

Basic Usage

Run this command for a list of options (On Windows use scancode instead of ./scancode):

./scancode --help

Run this command for a list of command line examples:

./scancode --examples

To run a scan on sample data, first run this:

./scancode --format html-app samples samples.html

Then open samples.html in your web browser to see the results.

Changelog

(NEXT)

2.1.0 (2017-09-22)

This is a minor release with several new and improved features and bug fixes but no significant API changes.

  • New plugin architecture by @yashdsaraf

  • we can now have pre-scan, post-scan and output format plugins

  • there is a new CSV output format and some example, experimental plugins

  • the CLI UI has changed to better support these plugins

  • New and improved licenses and license detection rules including support for EPL-2.0 and OpenJDK-related licensing and synchronization with the latest SPDX license list

  • Multiple bug fixes such as:

    • Ensure that authors are reported even if there is no copyright #669

    • Fix Maven package POM parsing infinite loop #721

    • Improve handling of weird non-unicode byte paths #688 and #706

    • Improve PDF parsing to avoid some crash #723

Credits: Many thanks to everyone that contributed to this release with code and bug reports (and this list is likely missing some)

  • @abuhman

  • @chinyeungli

  • @jimjag

  • @JonoYang

  • @jpopelka

  • @majurg

  • @mjherzog

  • @pgier

  • @pkajaba

  • @pombredanne

  • @scottctr

  • @sschuberth

  • @yahalom5776

  • @yashdsaraf

2.0.1 (2017-07-03)

This is a minor release with minor new and improved features and bug fixes.

  • New and improved license detection, including refined match scoring for #534

  • Bug fixed in License detection leading to a very long scan time for some rare JavaScript files. Reported by @jarnugirdhar

  • New “base_name” attribute returned with file information. Reported by @chinyeungli

  • Bug fixed in Maven POM package detection. Reported by @kalagp

2.0.0 (2017-06-23)

This is a major release with several new and improved features and bug fixes.

Some of the key highlights include:

  • License:

    • Brand new, faster and accurate detection engine using multiple techniques eventually doing multiple exhaustive comparisons of a scanned file content against all the license and rule texts.

    • Several new licenses and over 2500+ new and improved licenses detection rules have been added making the detection significantly better (and weirdly enough faster too as a side-effect of the new detection engine)

    • the matched license text can be optionally returned with the –license-text option

    • The detection accuracy has been benchmarked against other detection engine and ScanCode has shown to be more accurate and comprehensive than all the other engines reviewed.

    • improved scoring of license matches

  • Package and dependencies:

  • new and improved detection of multiple package formats: NPM, Maven, NuGet, PHP Composer, Python Pypi and RPM. In most cases direct, declared dependencies are also reported.

  • several additional package formats will be reported in the future version.

  • note: the structure of Packages data is evolving and should not be considered API at this stage

  • Scan outputs:

  • New SPDX tag/values and RDF outputs.

  • new compact JSON format (the pretty printed format is still available with the the json-pp format). The JSON format has been changed significantly and is closer to a documented, standard format that we call the ABC data format.

  • Minor refinements on the html and html-app format. Note that the html-app format will be deprecated and replaced by the new AboutCode Manager desktop app (electron-based) in future versions.

  • Copyright: Improved copyright detection: several false positive are no longer returned and copyright s are more accurate

  • Archive: support for shallow extraction and support for new archive types (such as Spring boot shell archives)

  • Performance:

  • Everything is generally faster, and license detection performance has been significantly improved.

  • Scans can run on multiple processes in parallel with the new –processes option speeding up things even further. A scan of a full Debian pool of source packages was reported to scan in about 11 hours (on a rather beefy 144 cores, 256GB machine)

  • Reduced memory usage with the use of caching

  • Other notes:

    • This is the last release with Linux 32 bits architecture support

    • The scan of a file can be interrupted after a timeout with a 120 seconds default

    • ScanCode is now available as a library on the Pypi Python package index for use as a library. The documentation for the library usage will follow in future versions

    • New –ignore option: You can optionally ignore certain file and paths during a scan

    • New –diag option: display additional debug and diagnostic data

    • The scanned file paths can now reported as relative, rooted or absolute with new command line options with a default to a rooted path.

Thank you to all contributors to this release and the 200+ stars and 60+ forks on GitHub!

  • Credits in alphabetic order:

Alexander Lisianoi Avi Aryan Benedikt Spranger Chin Yeung Dennis Clark Hugo Jacob Jakub Wilk Jericho @attritionorg Jillian Daguil Jiri Popelka John M. Horan Jonathan “Jono” Yang Li Ha Michael Herzog Michael Rupprecht Nusrat Sultana Paul Kunz Philippe Ombredanne Rakesh Balusa Ranvir Singh Richard Fontana Sebastian Schuberth Steven Esser Thomas Gleixner Tisoga @forrestchang Yash D. Saraf Yash Sharma

1.6.0 (2016-01-29)

  • New features

  • The HTML app now displays a copyright holder summary graphic

  • HTML app ui enhancements

  • File extraction fixes

  • New and improved license and detection rules

  • Other minor improvements and minor bug fixes

1.5.0 (2015-12-15)

  • New features

  • The HTML app now display a license summary graphic

  • Copyright holders and Authors are now collected together with copyrights

  • New email and url scan options: scan for URLs and emails

  • New and improved license and detection rules

These scans are for now only available in the JSON output

1.4.3 (2015-12-03)

  • Minor bug fix

  • In the HTML app, the scanned path was hardcoded as scancode-toolkit2/scancode-toolkit/samples instead of displaying the path that was scanned.

1.4.2 (2015-12-03)

  • Minor features and bug fixes

  • The release archives were missing some code (packagedcode)

  • Improved –quiet option for command line operations

  • New support for custom Jinja templates for the HTML output. The template also has access to the whole License object to output full license texts or other data. Thanks to @ened Sebastian Roth for this.

1.4.0 (2015-11-24)

  • New features and bug fixes

1.3.1 (2015-07-27)

  • Minor bug fixes.

1.3.0 (2015-07-24)

  • New features and bug fixes

  • scancode now ignores version control directories by default (.svn, .git, etc)

  • Improved copyright and license detections (new rules, etc.)

  • other minor improvements and minor bug fixes.

  • experimental and unsupported inclusion of Linux-32 bits pre-built binaries

1.2.4 (2015-07-22)

  • Minor bug fixes.

  • Improved copyright detections.

  • can scan a single file located in the installation directory

  • other minor improvements and minor bug fixes.

1.2.3 (2015-07-16)

  • Major bug fixes on Windows.

  • This is a major bug fix release for Windows. The -extract option was not working on Windows in previous 1.2.x pre-releases

1.2.2 (2015-07-14)

  • Minor bug fixes.

  • Support relative path when doing extract.

1.2.1 (2015-07-13)

  • Minor bug fixes.

  • Improper extract warning handling

1.2.0 (2015-07-13)

  • Major bug fixes.

  • Fixed issue #26: Slow –extract

  • Added support for progress during extraction (#27)

1.1.0 (2015-07-06)

  • Minor bug fixes.

  • Enforced exclusivity of –extract option

  • Improved command line help.

  • Added continuous testing with Travis and Appveyor and fixed tests

1.0.0 (2015-06-30)

  • Initial release.

  • support for scanning licenses and copyrights

  • simple command line with html, html-app and JSON formats output

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

scancode_toolkit-2.1.0-py2-none-any.whl (18.6 MB view hashes)

Uploaded Python 2

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page