Skip to main content

Fast and syntax-aware semantic code pattern search for many languages: like grep but for code

Project description

Semgrep logo

Lightweight static analysis for many languages.
Find and block bug variants with rules that look like source code.

Installation · Motivation · Overview · Usage
Resources · Contributing · Commercial Support

Homebrew PyPI Issues welcome! Issues welcome! Test Status Follow @r2cdev

Semgrep is a command-line tool for offline static analysis. Use pre-built or custom rules to enforce code and security standards in your codebase. You can try it now with our interactive live editor.

Semgrep combines the convenient and iterative style of grep with the powerful features of an Abstract Syntax Tree (AST) matcher and limited dataflow. Easily find function calls, class or method definitions, and more without having to understand ASTs or wrestle with regexes.

Visit Installation and Usage to get started.

Installation

Want to skip installation? You can run Semgrep via our interactive live editor at semgrep.live.

On macOS, binaries are available via Homebrew:

$ brew install semgrep

On Ubuntu/WSL/linux, we recommend installing via pip

$ pip3 install semgrep

An install script is also available with each release if you want a native binary.

$ ./semgrep-v0.16.0-ubuntu-generic.sh

To try Semgrep without installation, you can also run it via Docker:

$ docker run --rm -v "${PWD}:/src" returntocorp/semgrep --help

See Usage to learn about running pre-built rules and writing custom ones.

Motivation

Semgrep exists because:

  1. Insecure code is easy to write
  2. The future of security involves automatically guiding developers towards a “paved road” made of default-safe frameworks (i.e. React or Object-relational Mappers)
  3. grep isn’t expressive enough and traditional static analysis tools (SAST) are too complicated/slow for paved road automation

The AppSec, Developer, and DevOps communities deserve a static analysis tool that is fast, easy to use, code-aware, multi-lingual, and open source!

Overview

Semgrep is optimized for:

  • Speed: Fast enough to run on every build, commit, or file save
  • Finding bugs that matter: Run your own specialized rules or choose OWASP 10 checks from the Semgrep Registry. Rules match source code at the Abstract Syntax Tree (AST) level, unlike regexes that match strings and aren't semantically aware.
  • Ease of customization: Rules look like the code you’re searching, no static analysis PhD required. They don't require compiled code, only source, reducing iteration time.
  • Ease of integration. Highly portable and many CI and git-hook integrations already exist. Output --json and pipe results into your existing systems.
  • Polyglot environments: Don't learn and maintain multiple tools for your polyglot environment (e.g. ESLint, find-sec-bugs, RuboCop, Gosec). Use the same syntax and concepts independent of language.

Language Support

Python JavaScript Go       Java   C         JSON Ruby OCaml TypeScript PHP    
🚧 🚧 Coming... Coming...

Missing support for a language? Let us know by filing a ticket, joining our Slack, or emailing support@r2c.dev.

Pattern Syntax Teaser

One of the most unique and useful things about Semgrep is how easy it is to write and iterate on queries.

The goal is to make it as easy as possible to go from an idea in your head to finding the code patterns you intend to.

Example: Say you want to find all calls to a function named exec, and you don't care about the arguments. With Semgrep, you could simply supply the pattern exec(...) and you'd match:

# Simple cases grep finds
exec("ls")
exec(some_var)

# But you don't have to worry about whitespace
exec (foo)

# Or calls across multiple lines
exec (
    bar
)

Importantly, Semgrep would not match the following:

# grep would match this, but Semgrep ignores it because
# it doesn't have the right function name
other_exec(bar)

# Semgrep ignores commented out lines
# exec(foo)

# and hard-coded strings
print("exec(bar)")

Semgrep will even match aliased imports:

# Semgrep knows that safe_function refers to exec so it
# will still match!
#   Oof, try finding this with grep
import exec as safe_function
safe_function(tricksy)

Play with this example in your browser here, or copy the above code into a file locally (exec.py) and run:

$ semgrep -l python -e "exec(...)" /path/to/exec.py

More example patterns:

Pattern Matches
$X == $X if (node.id == node.id): ...
requests.get(..., verify=False, ...) requests.get(url, timeout=3, verify=False)
os.system(...) from os import system; system('echo semgrep')
$ELEMENT.innerHTML el.innerHTML = "<img src='x' onerror='alert(`XSS`)'>";
$TOKEN.SignedString([]byte("...")) ss, err := token.SignedString([]byte("HARDCODED KEY"))

see more example patterns in the Semgrep Registry.

For more info on what you can do in patterns, see the pattern features docs.

Usage

Semgrep supports three primary workflows:

  • Run pre-built rules
  • Writing custom rules
  • Run Semgrep continously in CI

The following sections cover each in more detail.

Run Pre-Built Rules

The easiest way to get started with Semgrep (other than semgrep.live) is to scan your code with pre-built rules.

The Semgrep Registry contains rules for many programming errors, including security issues and correctness bugs. Security rules are annotated with CWE and OWASP metadata when applicable. OWASP rule coverage per language is displayed below.

You can use pre-built Rule Packs, that contain sets of rules grouped by language and/or framework:

$ semgrep --config=https://semgrep.live/c/p/java
$ semgrep --config=https://semgrep.live/c/p/python
$ semgrep --config=https://semgrep.live/c/p/golang
$ semgrep --config=https://semgrep.live/c/p/javascript
...

Or you can run all of Semgrep's default rules for all languages as appropriate (note: each rule says what language it's for, so Semgrep won't try to run a Python rule on Java code).

$ semgrep --config=r2c

You can also run a specific rule or group of rules:

# Run a specific rule
$ semgrep --config=https://semgrep.live/c/r/java.spring.security.audit.cookie-missing-samesite

# Run a set of rules
$ semgrep --config=https://semgrep.live/c/r/java.spring.security

All public Semgrep rules can be viewed on the Registry, which pulls the rules from YAML files defined in the semgrep-rules GitHub repo.

Here are some sample vulnerable repos to test on:

Writing Custom Rules

One of the strengths of Semgrep is how easy it is to write rules.

This makes it possible to:

  • Quickly port rules from other tools.
  • Think of an interesting code pattern, and then find instances of it in your code.
  • Find code base or org-specific bugs and antipatterns - things that built-in checks for existing tools won't find because they're unique to you.
  • and more!

Simple Rules

For iterating on simple patterns, you can use the --lang and --pattern flags.

$ semgrep --lang javascript --pattern 'eval(...)' path/to/file.js

The --lang flag tells Semgrep which language you're targeting and --pattern is the code pattern to search for.

Advanced Rules

Some rules need more than one line of pattern to express. Sometimes you want to express code patterns, like: X must be true AND Y must be too, or X but NOT Y, or X must occur inside a block of code that Y matches.

For these cases, Semgrep has a more powerful and flexible YAML syntax.

You can run a single rule or directory of rules specified in YAML by:

$ semgrep --config my_rule.yml path/to/dir_or_file

$ semgrep --config yaml_dir/ path/to/dir_or_file

Example Advanced Rule

Say you are building a financial trading application in which every Transaction object must first be passed to verify_transaction() before being passed to make_transaction(), or it's a business logic bug.

You can express this behavior with the following Semgrep YAML pattern:

rules:
- id: find-unverified-transactions
  patterns:
    - pattern: |
        public $RETURN $METHOD(...){
            ...
            make_transaction($T);
            ...
        }
    - pattern-not: |
        public $RETURN $METHOD(...){
            ...
            verify_transaction($T);
            ...
            make_transaction($T);
            ...
        }
  message: |
    In $METHOD, there's a call to make_transaction() without first calling verify_transaction() on the Transaction object.
  • $RETURN, $METHOD, and $T are metavariables, an abstraction that Semgrep provides when you want to match something but you don't know exactly what it is ahead of time.
    • You can think of metavariables like a capture group in regular expressions.
  • The pattern clause defines what we're looking for: any method that calls make_transaction().
  • The pattern-not clause filters out matches we don't want; in this case, methods where a transaction ($T) is passed to verify_transaction() before make_transaction().
  • The message is what's returned in Semgrep output, either to STDOUT or as a comment on the pull request on GitHub or other systems.
    • Note that metavariables can be used to customize messages and make them contextually relevant. Here we're helpfully telling the user the method where we've identified the bug.

You can play with this transaction example here: https://semgrep.live/4b4g.

Learn More

Run Semgrep Continously in CI

Semgrep can be run via CLI or Docker and output results as JSON (via the --json flag), so it can be inserted into any CI pipeline and have its results processed by whatever tools you're using.

Semgrep is aware of diffs, so it can report only findings that occur in newly added code, for example, in a commit or pull request.

Currently, the easiest way to integrate Semgrep into CI is via a GitHub action we've built. See the integrations docs for more details.

Semgrep can also output results in the standardized Static Analysis Results Interchange Format (SARIF) with the --sarif flag, if you use tools that accept this format.

Upgrading

How you upgrade Semgrep will depend on how you installed it.

From Homebrew:

$ brew upgrade semgrep

From PyPI:

$ python -m pip install --upgrade semgrep

From Docker:

$ docker pull returntocorp/semgrep:latest

Resources

Learn more:

Get in touch:

Contributing

Semgrep is LGPL-licensed, feel free to help out: CONTRIBUTING.

Semgrep is a frontend to a larger program analysis library named pfff. pfff began and was open-sourced at Facebook but is now archived. The primary maintainer now works at r2c. Semgrep was originally named sgrep and was renamed to avoid collisons with existing projects.

Commercial Support

Semgrep is proudly supported by r2c. We're hiring!

Interested in a fully-supported, hosted version of semgrep? Drop your email and we'll ping you!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semgrep-0.16.0.tar.gz (71.7 kB view details)

Uploaded Source

Built Distributions

semgrep-0.16.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl (26.9 MB view details)

Uploaded CPython 3.6 CPython 3.7 CPython 3.8 Python 3.6 Python 3.7 Python 3.8

semgrep-0.16.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl (25.3 MB view details)

Uploaded CPython 3.6 CPython 3.7 CPython 3.8 Python 3.6 Python 3.7 Python 3.8 macOS 10.14+ x86-64

File details

Details for the file semgrep-0.16.0.tar.gz.

File metadata

  • Download URL: semgrep-0.16.0.tar.gz
  • Upload date:
  • Size: 71.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.8

File hashes

Hashes for semgrep-0.16.0.tar.gz
Algorithm Hash digest
SHA256 12508b73456719a0343b16733456585b923767f5888369c2f436cee93efe308a
MD5 767a678f53006c50938ee6ef3401da4b
BLAKE2b-256 5e9f6449ff48aa851a1b9692372d79fd7f176f553d495edad24286d75043e140

See more details on using hashes here.

File details

Details for the file semgrep-0.16.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl.

File metadata

  • Download URL: semgrep-0.16.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl
  • Upload date:
  • Size: 26.9 MB
  • Tags: CPython 3.6, CPython 3.7, CPython 3.8, Python 3.6, Python 3.7, Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.8

File hashes

Hashes for semgrep-0.16.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 12992317506f8f221d3d1de4af18123c2417d6cbece81fe03eb23d1de5081efa
MD5 0e6417fd4a25d0403ccaec2fd9e1aab2
BLAKE2b-256 0494d7c83dc7e4529cde9e975de5f00384a7ba8bdf1a50cabb854d37924e93b9

See more details on using hashes here.

File details

Details for the file semgrep-0.16.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: semgrep-0.16.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 25.3 MB
  • Tags: CPython 3.6, CPython 3.7, CPython 3.8, Python 3.6, Python 3.7, Python 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.8

File hashes

Hashes for semgrep-0.16.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 29cc0baa72920e307101f13c95d8b3b7c7a42f6bf1ae6df8b7f9972da5993a17
MD5 e20ed88ab1a4a511f37e57162facb6a4
BLAKE2b-256 8a6128611d082ca1d8be8bc9c06ef10e22ae8bf4595e855a18583c87f33e0347

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page