ScanCode is a tool to scan code for license, copyright, package and their documented dependencies and other interesting facts.
Project description
Build and tests status
Branch |
Coverage |
Linux (Travis) |
MacOSX (Travis) |
Windows (AppVeyor) |
---|---|---|---|---|
Master |
||||
Develop |
ScanCode is a suite of utilities used to scan a codebase for license, copyright, packages dependencies and other interesting information that can be discovered in source and binary code files.
A typical software project often reuses hundreds of third-party components. License and origin information is often scattered and not easy to find: ScanCode discovers this data for you.
ScanCode provides accurate scan results and the line position where each result is found. The results can be formatted as JSON or HTML. ScanCode provides a simple HTML app for quick visualization of scan results (see screenshot below), but you will have more robust analysis options if you use AboutCode Manager to view a scan. AboutCode Manager is a desktop application available or Linux, OSX or Windows - go to https://github.com/nexB/aboutcode-manager to learn more or to download AboutCode Manager.
We are continuously working on new features, such as analysis of dependencies or improving performance for scanning of larger codebases.
See the roadmap for upcoming features: https://github.com/nexB/scancode-toolkit/wiki/Roadmap
Quick Start
For Windows, please go to the Comprehensive Installation section instead.
- Make sure you have Python 2.7 installed:
Download and install Python 2.7 32 bits for Windows https://www.python.org/ftp/python/2.7.13/python-2.7.13.msi
Download and install Python 2.7 for Mac https://www.python.org/ftp/python/2.7.13/python-2.7.13-macosx10.6.pkg
On Linux install Python 2.7 “devel” and a few extra packages:
sudo apt-get install python-dev bzip2 xz-utils zlib1g libxml2-dev libxslt1-dev for Ubuntu 12.04, 14.04 and 16.04
sudo apt-get install python-dev libbz2-1.0 xz-utils zlib1g libxml2-dev libxslt1-dev for Debian and Debian-based distros
sudo yum install python-devel zlib bzip2-libs xz-libs libxml2-devel libxslt-devel for RPM distros
sudo dnf install python-devel zlib bzip2-libs xz-libs libxml2-devel libxslt-devel for Fedora 22 and later
See the Comprehensive Installation for additional details and other Linux installations: https://github.com/nexB/scancode-toolkit/wiki/Comprehensive-Installation
Next, download and extract the latest ScanCode release from:
https://github.com/nexB/scancode-toolkit/releases/
Open a terminal, extract the downloaded release archive, then cd to the extracted directory and run this command to display the command help. ScanCode will self-configure if needed:
./scancode --help
Run a sample scan saved to the samples.html file:
./scancode --format html-app samples samples.html
Then open samples.html in your web browser to view the scan results.
See more command examples:
./scancode --examples
Support
If you have a problem, a suggestion or found a bug, please enter a ticket at: https://github.com/nexB/scancode-toolkit/issues
For other questions, discussions, and chats, we have:
a mailing list at https://lists.sourceforge.net/lists/listinfo/aboutcode-discuss
an official Gitter channel at https://gitter.im/aboutcode-org/discuss Gitter also has an IRC bridge at https://irc.gitter.im/
an official #aboutcode IRC channel on freenode (server chat.freenode.net) for scancode and other related tools. Note that this receives notifactions from repos so it can be a tad noisy. You can use your favorite IRC client or use the web chat at https://webchat.freenode.net/
About archives
All code must be extracted before running ScanCode since ScanCode will not extract files from tarballs, zip files, etc. However, the ScanCode Toolkit comes with a utility called extractcode that does recursive archive extraction. For example, this command will recursively extract the mytar.tar.bz2 tarball in the mytar.tar.bz2-extract directory:
./extractcode mytar.tar.bz2
Source code
License
Apache-2.0 with an acknowledgement required to accompany the scan output.
Public domain CC-0 for reference datasets.
Multiple licenses (GPL2/3, LGPL, MIT, BSD, etc.) for third-party components.
See the NOTICE file for more details.
Documentation & FAQ
Basic Usage
Run this command for a list of options (On Windows use scancode instead of ./scancode):
./scancode --help
Run this command for a list of command line examples:
./scancode --examples
To run a scan on sample data, first run this:
./scancode --format html-app samples samples.html
Then open samples.html in your web browser to see the results.
Changelog
(NEXT)
2.1.0 (2017-09-22)
This is a minor release with several new and improved features and bug fixes but no significant API changes.
New plugin architecture by @yashdsaraf
we can now have pre-scan, post-scan and output format plugins
there is a new CSV output format and some example, experimental plugins
the CLI UI has changed to better support these plugins
New and improved licenses and license detection rules including support for EPL-2.0 and OpenJDK-related licensing and synchronization with the latest SPDX license list
Multiple bug fixes such as:
Ensure that authors are reported even if there is no copyright #669
Fix Maven package POM parsing infinite loop #721
Improve handling of weird non-unicode byte paths #688 and #706
Improve PDF parsing to avoid some crash #723
Credits: Many thanks to everyone that contributed to this release with code and bug reports (and this list is likely missing some)
@abuhman
@chinyeungli
@jimjag
@JonoYang
@jpopelka
@majurg
@mjherzog
@pgier
@pkajaba
@pombredanne
@scottctr
@sschuberth
@yahalom5776
@yashdsaraf
2.0.1 (2017-07-03)
This is a minor release with minor new and improved features and bug fixes.
New and improved license detection, including refined match scoring for #534
Bug fixed in License detection leading to a very long scan time for some rare JavaScript files. Reported by @jarnugirdhar
New “base_name” attribute returned with file information. Reported by @chinyeungli
Bug fixed in Maven POM package detection. Reported by @kalagp
2.0.0 (2017-06-23)
This is a major release with several new and improved features and bug fixes.
Some of the key highlights include:
License:
Brand new, faster and accurate detection engine using multiple techniques eventually doing multiple exhaustive comparisons of a scanned file content against all the license and rule texts.
Several new licenses and over 2500+ new and improved licenses detection rules have been added making the detection significantly better (and weirdly enough faster too as a side-effect of the new detection engine)
the matched license text can be optionally returned with the –license-text option
The detection accuracy has been benchmarked against other detection engine and ScanCode has shown to be more accurate and comprehensive than all the other engines reviewed.
improved scoring of license matches
Package and dependencies:
new and improved detection of multiple package formats: NPM, Maven, NuGet, PHP Composer, Python Pypi and RPM. In most cases direct, declared dependencies are also reported.
several additional package formats will be reported in the future version.
note: the structure of Packages data is evolving and should not be considered API at this stage
Scan outputs:
New SPDX tag/values and RDF outputs.
new compact JSON format (the pretty printed format is still available with the the json-pp format). The JSON format has been changed significantly and is closer to a documented, standard format that we call the ABC data format.
Minor refinements on the html and html-app format. Note that the html-app format will be deprecated and replaced by the new AboutCode Manager desktop app (electron-based) in future versions.
Copyright: Improved copyright detection: several false positive are no longer returned and copyright s are more accurate
Archive: support for shallow extraction and support for new archive types (such as Spring boot shell archives)
Performance:
Everything is generally faster, and license detection performance has been significantly improved.
Scans can run on multiple processes in parallel with the new –processes option speeding up things even further. A scan of a full Debian pool of source packages was reported to scan in about 11 hours (on a rather beefy 144 cores, 256GB machine)
Reduced memory usage with the use of caching
Other notes:
This is the last release with Linux 32 bits architecture support
The scan of a file can be interrupted after a timeout with a 120 seconds default
ScanCode is now available as a library on the Pypi Python package index for use as a library. The documentation for the library usage will follow in future versions
New –ignore option: You can optionally ignore certain file and paths during a scan
New –diag option: display additional debug and diagnostic data
The scanned file paths can now reported as relative, rooted or absolute with new command line options with a default to a rooted path.
Thank you to all contributors to this release and the 200+ stars and 60+ forks on GitHub!
Credits in alphabetic order:
Alexander Lisianoi Avi Aryan Benedikt Spranger Chin Yeung Dennis Clark Hugo Jacob Jakub Wilk Jericho @attritionorg Jillian Daguil Jiri Popelka John M. Horan Jonathan “Jono” Yang Li Ha Michael Herzog Michael Rupprecht Nusrat Sultana Paul Kunz Philippe Ombredanne Rakesh Balusa Ranvir Singh Richard Fontana Sebastian Schuberth Steven Esser Thomas Gleixner Tisoga @forrestchang Yash D. Saraf Yash Sharma
1.6.0 (2016-01-29)
New features
The HTML app now displays a copyright holder summary graphic
HTML app ui enhancements
File extraction fixes
New and improved license and detection rules
Other minor improvements and minor bug fixes
1.5.0 (2015-12-15)
New features
The HTML app now display a license summary graphic
Copyright holders and Authors are now collected together with copyrights
New email and url scan options: scan for URLs and emails
New and improved license and detection rules
These scans are for now only available in the JSON output
1.4.3 (2015-12-03)
Minor bug fix
In the HTML app, the scanned path was hardcoded as scancode-toolkit2/scancode-toolkit/samples instead of displaying the path that was scanned.
1.4.2 (2015-12-03)
Minor features and bug fixes
The release archives were missing some code (packagedcode)
Improved –quiet option for command line operations
New support for custom Jinja templates for the HTML output. The template also has access to the whole License object to output full license texts or other data. Thanks to @ened Sebastian Roth for this.
1.4.0 (2015-11-24)
New features and bug fixes
Separated JSON data into a separate file for the html app. https://github.com/nexB/scancode-toolkit/issues/38
Added support for scanning package and file information.
Added file and package information to the html-app and html output. https://github.com/nexB/scancode-toolkit/issues/76
improved CSS for html format output https://github.com/nexB/scancode-toolkit/issues/12
New and improved licenses rules and licenses.
Added support for nuget .nupkg as archives.
Created new extractcode standalone command for https://github.com/nexB/scancode-toolkit/issues/52 Extracting archives is no longer part of the scancode command.
Scancode can now be called from anywhere. https://github.com/nexB/scancode-toolkit/issues/55
Various minor improvements for copyright detection.
1.3.1 (2015-07-27)
Minor bug fixes.
fixed –verbose option https://github.com/nexB/scancode-toolkit/issues/37
Improved copyright and license detections (new rules, etc.)
other minor improvements and minor bug fixes: temptative fix for https://github.com/nexB/scancode-toolkit/issues/4
fixed for unsupported inclusion of Linux-32 bits pre-built binaries https://github.com/nexB/scancode-toolkit/issues/33
1.3.0 (2015-07-24)
New features and bug fixes
scancode now ignores version control directories by default (.svn, .git, etc)
Improved copyright and license detections (new rules, etc.)
other minor improvements and minor bug fixes.
experimental and unsupported inclusion of Linux-32 bits pre-built binaries
1.2.4 (2015-07-22)
Minor bug fixes.
Improved copyright detections.
can scan a single file located in the installation directory
other minor improvements and minor bug fixes.
1.2.3 (2015-07-16)
Major bug fixes on Windows.
This is a major bug fix release for Windows. The -extract option was not working on Windows in previous 1.2.x pre-releases
1.2.2 (2015-07-14)
Minor bug fixes.
Support relative path when doing extract.
1.2.1 (2015-07-13)
Minor bug fixes.
Improper extract warning handling
1.2.0 (2015-07-13)
Major bug fixes.
Fixed issue #26: Slow –extract
Added support for progress during extraction (#27)
1.1.0 (2015-07-06)
Minor bug fixes.
Enforced exclusivity of –extract option
Improved command line help.
Added continuous testing with Travis and Appveyor and fixed tests
1.0.0 (2015-06-30)
Initial release.
support for scanning licenses and copyrights
simple command line with html, html-app and JSON formats output
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for scancode_toolkit-2.1.0-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6780e7b9c815354b3ebdc862688b3d6e4060894bc7633f986d31243a3b4760f3 |
|
MD5 | 5801edc80662356a3f5a903a3e94674d |
|
BLAKE2b-256 | a72f0559e0d7719ebd60ea80a93f4cc1e7d8529916fb7ee3f39bfd31bbaea60a |