Skip to main content

Fast and syntax-aware semantic code pattern search for many languages: like grep but for code

Project description

Semgrep

Homebrew r2c Community Slack

semgrep is a tool for easily detecting and preventing bugs and anti-patterns in your codebase. It combines the convenience of grep with the correctness of syntactical and semantic search. Developers, DevOps engineers, and security engineers use semgrep to write code with confidence.

Try it now: https://semgrep.live

Overview

Language support:

Python Javascript Go       Java   C         Typescript PHP    
Coming... Coming...

Example patterns:

Pattern Matches
$X == $X if (node.id == node.id): ...
requests.get(..., verify=False, ...) requests.get(url, timeout=3, verify=False)
os.system(...) from os import system; system('echo semgrep')
$ELEMENT.innerHTML el.innerHTML = "<img src='x' onerror='alert(`XSS`)'>";
$TOKEN.SignedString([]byte("...")) ss, err := token.SignedString([]byte("HARDCODED KEY"))

see more example patterns in the live registry viewer

Installation

On macOS, binaries are available via Homebrew:

brew install returntocorp/semgrep/semgrep

On Ubuntu, an install script is available on each release here

./semgrep-v0.8.1-ubuntu-generic.sh

To try semgrep without installation, you can also run it via Docker:

docker run --rm -v "${PWD}:/home/repo" returntocorp/semgrep --help

Usage

Example Usage

Here is a simple Python example, test.py. We want to retrieve an object by ID:

def get_node(node_id, nodes):
    for node in nodes:
        if node.id == node.id:  # Oops, supposed to be 'node_id'
            return node
    return None

This is a bug. Let's use semgrep to find bugs like it, using a simple search pattern: $X == $X. It will find all places in our code where the left- and right-hand sides of a comparison are the same expression:

$ semgrep --lang python --pattern '$X == $X' test.py
test.py
3:        if node.id == node.id:  # Oops, supposed to be 'node_id'

Configuration

For simple patterns use the --lang and --pattern flags. This mode of operation is useful for quickly iterating on a pattern on a single file or folder:

semgrep --lang javascript --pattern 'eval(...)' path/to/file.js

Configuration Files

For advanced configuration use the --config flag. This flag automagically handles a multitude of input configuration types:

  • --config <file|folder|yaml_url|tarball_url|registy_name>

In the absence of this flag, a default configuration is loaded from .semgrep.yml or multiple files matching .semgrep/**/*.yml.

Pattern Features

semgrep patterns make use of two primary features:

  • Metavariables like $X, $WIDGET, or $USERS_2. Metavariable names can only contain uppercase characters, or _, or digits, and must start with an uppercase character or _ - names like $x or $some_value are invalid. Metavariables are used to track a variable across a specific code scope.
  • The ... (ellipsis) operator. The ellipsis operator abstracts away sequences so you don't have to sweat the details of a particular code pattern.

For example,

$FILE = open(...)

will find all occurrences in your code where the result of an open() call is assigned to a variable.

Composing Patterns

You can also construct rules by composing multiple patterns together.

Let's consider an example:

rules:
  - id: open-never-closed
    patterns:
      - pattern: $FILE = open(...)
      - pattern-not-inside: |
          $FILE = open(...)
          ...
          $FILE.close()
    message: "file object opened without corresponding close"
    languages: [python]
    severity: ERROR

This rule looks for files that are opened but never closed. It accomplishes this by looking for the open(...) pattern and not a following close() pattern. The $FILE metavariable ensures that the same variable name is used in the open and close calls. The ellipsis operator allows for any arguments to be passed to open and any sequence of code statements in-between the open and close calls. We don't care how open is called or what happens up to a close call, we just need to make sure close is called.

For more information on rule fields like patterns and pattern-not-inside see the configuration documentation.

Equivalences

Equivalences are another key concept in semgrep. semgrep automatically searches for code that is semantically equivalent. For example, the following patterns are semantically equivalent. The pattern subprocess.Popen(...) will fire on both.

subprocess.Popen("ls")
from subprocess import Popen as sub_popen

result = sub_popen("ls")

For a full list of semgrep feature support by language see the language matrix.

Registry

As mentioned above, you may also specify a registry name as configuration. r2c provides a registry of configuration files. These rules have been tuned on thousands of repositories using our analysis platform.

semgrep --config r2c

Programmatic Usage

To integrate semgrep's results with other tools, you can get results in machine-readable JSON format with the --json option, or formatted according to the SARIF standard with the --sarif flag.

See our output documentation for details.

Resources

Contribution

semgrep is LGPL-licensed, feel free to help out: CONTRIBUTING.

semgrep is a frontend to a larger program analysis library named pfff. pfff began and was open-sourced at Facebook but is now archived. The primary maintainer now works at r2c. semgrep was originally named sgrep and was renamed to avoid collisons with existing projects.

Commercial Support

semgrep is proudly supported by r2c. We're hiring!

Interested in a fully-supported, hosted version of semgrep? Drop your email and we'll ping you!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semgrep-0.8.1.tar.gz (36.0 kB view details)

Uploaded Source

Built Distributions

semgrep-0.8.1-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.6 CPython 3.7 CPython 3.8 Python 3.6 Python 3.7 Python 3.8

semgrep-0.8.1-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.6 CPython 3.7 CPython 3.8 Python 3.6 Python 3.7 Python 3.8 macOS 10.14+ x86-64

File details

Details for the file semgrep-0.8.1.tar.gz.

File metadata

  • Download URL: semgrep-0.8.1.tar.gz
  • Upload date:
  • Size: 36.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.36.1 CPython/3.7.7

File hashes

Hashes for semgrep-0.8.1.tar.gz
Algorithm Hash digest
SHA256 52754695eb602e0b7e3cf865f3bbe43c35232dd982d50a6d7855c9814bb52c62
MD5 27ad73804ba41c7dd983ff150db53b3e
BLAKE2b-256 acd20b09c5c6810cdfa3998681d97850b6c80675b17848255c100263b4681eed

See more details on using hashes here.

File details

Details for the file semgrep-0.8.1-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl.

File metadata

  • Download URL: semgrep-0.8.1-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.6, CPython 3.7, CPython 3.8, Python 3.6, Python 3.7, Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.36.1 CPython/3.7.7

File hashes

Hashes for semgrep-0.8.1-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6282806b39a3b0b3b46fbe4c4e227a87fa57baee520e80351542b4d6308302a9
MD5 939b6d1e138022a4f41c9789d3904412
BLAKE2b-256 7162f9fe48c649a17e558e9b87b1fffd35b0e450884eee7fa9bbb3c48de2bfa6

See more details on using hashes here.

File details

Details for the file semgrep-0.8.1-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: semgrep-0.8.1-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.6, CPython 3.7, CPython 3.8, Python 3.6, Python 3.7, Python 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.36.1 CPython/3.7.7

File hashes

Hashes for semgrep-0.8.1-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 231e1492695e09c32fd065f37780b8423be9306bd31f93be13a2b648a9ced156
MD5 c76870c4dd85dff86af2d3ae814e223e
BLAKE2b-256 9f5c18d375e34a54d61ed8012e2605955352c2c86a1508d27abdf07d30ef2e86

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page