Skip to main content

Fast and syntax-aware semantic code pattern search for many languages: like grep but for code

Project description

Semgrep logo

Lightweight static analysis for many languages.
Find and block bug variants with rules that look like source code.

Installation · Motivation · Overview · Usage
Resources · Contributing · Commercial Support

Homebrew PyPI Issues welcome! Issues welcome! Follow @r2cdev

Semgrep is a command-line tool for offline static analysis. Use pre-built or custom rules to enforce code and security standards in your codebase. You can try it now with our interactive live editor.

Semgrep combines the convenient and iterative style of grep with the powerful features of an Abstract Syntax Tree (AST) matcher and limited dataflow. Easily find function calls, class or method definitions, and more without having to understand ASTs or wrestle with regexes.

Visit Installation and Usage to get started.

Installation

Want to skip installation? You can run Semgrep via our interactive live editor at semgrep.live.

On macOS, binaries are available via Homebrew:

$ brew install semgrep

On Ubuntu, an install script is available with each release

$ ./semgrep-v0.12.0-ubuntu-generic.sh

To try Semgrep without installation, you can also run it via Docker:

$ docker run --rm -v "${PWD}:/home/repo" returntocorp/semgrep --help

See Usage to learn about running pre-built rules and writing custom ones.

Motivation

Semgrep exists because:

  1. Insecure code is easy to write
  2. The future of security involves automatically guiding developers towards a “paved road” made of default-safe frameworks (i.e. React or Object-relational Mappers)
  3. grep isn’t expressive enough and traditional static analysis tools (SAST) are too complicated/slow for paved road automation

The AppSec, Developer, and DevOps communities deserve a static analysis tool that is fast, easy to use, code-aware, multi-lingual, and open source!

Overview

Semgrep is optimized for:

  • Speed: Fast enough to run on every build, commit, or file save
  • Finding bugs that matter: Run your own specialized rules or choose OWASP 10 checks from the Semgrep Registry. Rules match source code at the Abstract Syntax Tree (AST) level, unlike regexes that match strings and aren't semantically aware.
  • Ease of customization: Rules look like the code you’re searching, no static analysis PhD required. They don't require compiled code, only source, reducing iteration time.
  • Ease of integration. Highly portable and many CI and git-hook integrations already exist. Output --json and pipe results into your existing systems.
  • Polyglot environments: Don't learn and maintain multiple tools for your polyglot environment (e.g. ESLint, find-sec-bugs, RuboCop, Gosec). Use the same syntax and concepts independent of language.

Language Support

Python JavaScript Go       Java   C         JSON Ruby OCaml TypeScript PHP    
🚧 🚧 Coming... Coming...

Missing support for a language? Let us know by filing a ticket, joining our Slack, or emailing support@r2c.dev.

Pattern Syntax Teaser

One of the most unique and useful things about Semgrep is how easy it is to write and iterate on queries.

The goal is to make it as easy as possible to go from an idea in your head to finding the code patterns you intend to.

Example: Say you want to find all calls to a function named exec, and you don't care about the arguments. With Semgrep, you could simply supply the pattern exec(...) and you'd match:

# Simple cases grep finds
exec("ls")
exec(some_var)

# But you don't have to worry about whitespace
exec (foo)

# Or calls across multiple lines
exec (
    bar
)

Importantly, Semgrep would not match the following:

# grep would match this, but Semgrep ignores it because
# it doesn't have the right function name
other_exec(bar)

# Semgrep ignores commented out lines
# exec(foo)

# and hard-coded strings
print("exec(bar)")

Semgrep will even match aliased imports:

# Semgrep knows that safe_function refers to exec so it
# will still match!
#   Oof, try finding this with grep
import exec as safe_function
safe_function(tricksy)

Play with this example in your browser here, or copy the above code into a file locally (exec.py) and run:

$ semgrep -l python -e "exec(...)" /path/to/exec.py

More example patterns:

Pattern Matches
$X == $X if (node.id == node.id): ...
requests.get(..., verify=False, ...) requests.get(url, timeout=3, verify=False)
os.system(...) from os import system; system('echo semgrep')
$ELEMENT.innerHTML el.innerHTML = "<img src='x' onerror='alert(`XSS`)'>";
$TOKEN.SignedString([]byte("...")) ss, err := token.SignedString([]byte("HARDCODED KEY"))

see more example patterns in the Semgrep Registry.

For more info on what you can do in patterns, see the pattern features docs.

Usage

Semgrep supports three primary workflows:

  • Run pre-built rules
  • Writing custom rules
  • Run Semgrep continously in CI

The following sections cover each in more detail.

Run Pre-Built Rules

The easiest way to get started with Semgrep (other than semgrep.live) is to scan your code with pre-built rules.

The Semgrep Registry contains rules for many programming errors, including security issues and correctness bugs. Security rules are annotated with CWE and OWASP metadata when applicable. OWASP rule coverage per language is displayed below.

You can use pre-built Rule Packs, that contain sets of rules grouped by language and/or framework:

$ semgrep --config=https://semgrep.live/c/p/java
$ semgrep --config=https://semgrep.live/c/p/python
$ semgrep --config=https://semgrep.live/c/p/golang
$ semgrep --config=https://semgrep.live/c/p/javascript
...

Or you can run all of Semgrep's default rules for all languages as appropriate (note: each rule says what language it's for, so Semgrep won't try to run a Python rule on Java code).

$ semgrep --config=r2c

You can also run a specific rule or group of rules:

# Run a specific rule
$ semgrep --config=https://semgrep.live/c/r/java.spring.security.audit.cookie-missing-samesite

# Run a set of rules
$ semgrep --config=https://semgrep.live/c/r/java.spring.security

All public Semgrep rules can be viewed on the Registry, which pulls the rules from YAML files defined in the semgrep-rules GitHub repo.

Here are some sample vulnerable repos to test on:

Writing Custom Rules

One of the strengths of Semgrep is how easy it is to write rules.

This makes it possible to:

  • Quickly port rules from other tools.
  • Think of an interesting code pattern, and then find instances of it in your code.
  • Find code base or org-specific bugs and antipatterns - things that built-in checks for existing tools won't find because they're unique to you.
  • and more!

Simple Rules

For iterating on simple patterns, you can use the --lang and --pattern flags.

$ semgrep --lang javascript --pattern 'eval(...)' path/to/file.js

The --lang flag tells Semgrep which language you're targeting and --pattern is the code pattern to search for.

Advanced Rules

Some rules need more than one line of pattern to express. Sometimes you want to express code patterns, like: X must be true AND Y must be too, or X but NOT Y, or X must occur inside a block of code that Y matches.

For these cases, Semgrep has a more powerful and flexible YAML syntax.

You can run a single rule or directory of rules specified in YAML by:

$ semgrep --config my_rule.yml path/to/dir_or_file

$ semgrep --config yaml_dir/ path/to/dir_or_file

Example Advanced Rule

Say you are building a financial trading application in which every Transaction object must first be passed to verify_transaction() before being passed to make_transaction(), or it's a business logic bug.

You can express this behavior with the following Semgrep YAML pattern:

rules:
- id: find-unverified-transactions
  patterns:
    - pattern: |
        public $RETURN $METHOD(...){
            ...
            make_transaction($T);
            ...
        }
    - pattern-not: |
        public $RETURN $METHOD(...){
            ...
            verify_transaction($T);
            ...
            make_transaction($T);
            ...
        }
  message: |
    In $METHOD, there's a call to make_transaction() without first calling verify_transaction() on the Transaction object.
  • $RETURN, $METHOD, and $T are metavariables, an abstraction that Semgrep provides when you want to match something but you don't know exactly what it is ahead of time.
    • You can think of metavariables like a capture group in regular expressions.
  • The pattern clause defines what we're looking for: any method that calls make_transaction().
  • The pattern-not clause filters out matches we don't want; in this case, methods where a transaction ($T) is passed to verify_transaction() before make_transaction().
  • The message is what's returned in Semgrep output, either to STDOUT or as a comment on the pull request on GitHub or other systems.
    • Note that metavariables can be used to customize messages and make them contextually relevant. Here we're helpfully telling the user the method where we've identified the bug.

You can play with this transaction example here: https://semgrep.live/4b4g.

Learn More

Run Semgrep Continously in CI

Semgrep can be run via CLI or Docker and output results as JSON (via the --json flag), so it can be inserted into any CI pipeline and have its results processed by whatever tools you're using.

Semgrep is aware of diffs, so it can report only findings that occur in newly added code, for example, in a commit or pull request.

Currently, the easiest way to integrate Semgrep into CI is via a GitHub action we've built. See the integrations docs for more details.

Semgrep can also output results in the standardized Static Analysis Results Interchange Format (SARIF) with the --sarif flag, if you use tools that accept this format.

Upgrading

How you upgrade Semgrep will depend on how you installed it.

From Homebrew:

$ brew upgrade semgrep

From PyPI:

$ python -m pip install --upgrade semgrep

From Docker:

$ docker pull returntocorp/semgrep:latest

Resources

Learn more:

Get in touch:

Contributing

Semgrep is LGPL-licensed, feel free to help out: CONTRIBUTING.

Semgrep is a frontend to a larger program analysis library named pfff. pfff began and was open-sourced at Facebook but is now archived. The primary maintainer now works at r2c. Semgrep was originally named sgrep and was renamed to avoid collisons with existing projects.

Commercial Support

Semgrep is proudly supported by r2c. We're hiring!

Interested in a fully-supported, hosted version of semgrep? Drop your email and we'll ping you!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semgrep-0.12.0.tar.gz (55.3 kB view details)

Uploaded Source

Built Distributions

semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.6 CPython 3.7 CPython 3.8 Python 3.6 Python 3.7 Python 3.8

semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.6 CPython 3.7 CPython 3.8 Python 3.6 Python 3.7 Python 3.8 macOS 10.14+ x86-64

File details

Details for the file semgrep-0.12.0.tar.gz.

File metadata

  • Download URL: semgrep-0.12.0.tar.gz
  • Upload date:
  • Size: 55.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for semgrep-0.12.0.tar.gz
Algorithm Hash digest
SHA256 18f87b1eba07997a6ab9ecf4c6c061c4b061f756502e3ab082bbf4ac04c5e3ca
MD5 14fe80acecc9a1c2e38f8d9c7c1a29ff
BLAKE2b-256 479ce0358eb5ca801925282f737b86ac25bbd9af8c91bef343039ad94c656ee3

See more details on using hashes here.

File details

Details for the file semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl.

File metadata

  • Download URL: semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: CPython 3.6, CPython 3.7, CPython 3.8, Python 3.6, Python 3.7, Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 bf4eab3f696c4ba33bb6a2d300a747e64ccc23338a0586d1f9961af3897a3320
MD5 4089843c8a4606c39a673cf6677890cd
BLAKE2b-256 d02cfd578e644a7f4cb7fe0b973d824f27aa344a630ae09f837cf043dd444a79

See more details on using hashes here.

File details

Details for the file semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: CPython 3.6, CPython 3.7, CPython 3.8, Python 3.6, Python 3.7, Python 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 79e91403ba1db64377d4e1dbbc6a2011dd7d7511806e106a4b172e5d6b5ad5c2
MD5 8c19d8d51547b4728b26926e60195754
BLAKE2b-256 37b0b38b3acf4209282cd2651d9fce84ebf4035c3c98b7e2e5b7d750e99a55f9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page