Fast and syntax-aware semantic code pattern search for many languages: like grep but for code
Project description
Lightweight static analysis for many languages.
Find and block bug variants with rules that look like source code.
Installation
·
Motivation
·
Overview
·
Usage
Resources
·
Contributing
·
Commercial Support
Semgrep is a command-line tool for offline static analysis. Use pre-built or custom rules to enforce code and security standards in your codebase. You can try it now with our interactive live editor.
Semgrep combines the convenient and iterative style of grep
with the powerful features of an Abstract Syntax Tree (AST) matcher and limited dataflow. Easily find function calls, class or method definitions, and more without having to understand ASTs or wrestle with regexes.
Visit Installation and Usage to get started.
Installation
Want to skip installation? You can run Semgrep via our interactive live editor at semgrep.live.
On macOS, binaries are available via Homebrew:
$ brew install semgrep
On Ubuntu, an install script is available with each release
$ ./semgrep-v0.12.0-ubuntu-generic.sh
To try Semgrep without installation, you can also run it via Docker:
$ docker run --rm -v "${PWD}:/home/repo" returntocorp/semgrep --help
See Usage to learn about running pre-built rules and writing custom ones.
Motivation
Semgrep exists because:
- Insecure code is easy to write
- The future of security involves automatically guiding developers towards a “paved road” made of default-safe frameworks (i.e. React or Object-relational Mappers)
grep
isn’t expressive enough and traditional static analysis tools (SAST) are too complicated/slow for paved road automation
The AppSec, Developer, and DevOps communities deserve a static analysis tool that is fast, easy to use, code-aware, multi-lingual, and open source!
Overview
Semgrep is optimized for:
- Speed: Fast enough to run on every build, commit, or file save
- Finding bugs that matter: Run your own specialized rules or choose OWASP 10 checks from the Semgrep Registry. Rules match source code at the Abstract Syntax Tree (AST) level, unlike regexes that match strings and aren't semantically aware.
- Ease of customization: Rules look like the code you’re searching, no static analysis PhD required. They don't require compiled code, only source, reducing iteration time.
- Ease of integration. Highly portable and many CI and git-hook integrations already exist. Output
--json
and pipe results into your existing systems. - Polyglot environments: Don't learn and maintain multiple tools for your polyglot environment (e.g. ESLint, find-sec-bugs, RuboCop, Gosec). Use the same syntax and concepts independent of language.
Language Support
Python | JavaScript | Go | Java | C | JSON | Ruby | OCaml | TypeScript | PHP |
---|---|---|---|---|---|---|---|---|---|
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 🚧 | 🚧 | Coming... | Coming... |
Missing support for a language? Let us know by filing a ticket, joining our Slack, or emailing support@r2c.dev.
Pattern Syntax Teaser
One of the most unique and useful things about Semgrep is how easy it is to write and iterate on queries.
The goal is to make it as easy as possible to go from an idea in your head to finding the code patterns you intend to.
Example: Say you want to find all calls to a function named exec
, and you don't care about the arguments. With Semgrep, you could simply supply the pattern exec(...)
and you'd match:
# Simple cases grep finds
exec("ls")
exec(some_var)
# But you don't have to worry about whitespace
exec (foo)
# Or calls across multiple lines
exec (
bar
)
Importantly, Semgrep would not match the following:
# grep would match this, but Semgrep ignores it because
# it doesn't have the right function name
other_exec(bar)
# Semgrep ignores commented out lines
# exec(foo)
# and hard-coded strings
print("exec(bar)")
Semgrep will even match aliased imports:
# Semgrep knows that safe_function refers to exec so it
# will still match!
# Oof, try finding this with grep
import exec as safe_function
safe_function(tricksy)
Play with this example in your browser here, or copy the above code into a file locally (exec.py
) and run:
$ semgrep -l python -e "exec(...)" /path/to/exec.py
More example patterns:
Pattern | Matches |
---|---|
$X == $X |
if (node.id == node.id): ... |
requests.get(..., verify=False, ...) |
requests.get(url, timeout=3, verify=False) |
os.system(...) |
from os import system; system('echo semgrep') |
$ELEMENT.innerHTML |
el.innerHTML = "<img src='x' onerror='alert(`XSS`)'>"; |
$TOKEN.SignedString([]byte("...")) |
ss, err := token.SignedString([]byte("HARDCODED KEY")) |
→ see more example patterns in the Semgrep Registry.
For more info on what you can do in patterns, see the pattern features docs.
Usage
Semgrep supports three primary workflows:
- Run pre-built rules
- Writing custom rules
- Run Semgrep continously in CI
The following sections cover each in more detail.
Run Pre-Built Rules
The easiest way to get started with Semgrep (other than semgrep.live) is to scan your code with pre-built rules.
The Semgrep Registry contains rules for many programming errors, including security issues and correctness bugs. Security rules are annotated with CWE and OWASP metadata when applicable. OWASP rule coverage per language is displayed below.
You can use pre-built Rule Packs, that contain sets of rules grouped by language and/or framework:
$ semgrep --config=https://semgrep.live/c/p/java
$ semgrep --config=https://semgrep.live/c/p/python
$ semgrep --config=https://semgrep.live/c/p/golang
$ semgrep --config=https://semgrep.live/c/p/javascript
...
Or you can run all of Semgrep's default rules for all languages as appropriate (note: each rule says what language it's for, so Semgrep won't try to run a Python rule on Java code).
$ semgrep --config=r2c
You can also run a specific rule or group of rules:
# Run a specific rule
$ semgrep --config=https://semgrep.live/c/r/java.spring.security.audit.cookie-missing-samesite
# Run a set of rules
$ semgrep --config=https://semgrep.live/c/r/java.spring.security
All public Semgrep rules can be viewed on the Registry, which pulls the rules from YAML files defined in the semgrep-rules GitHub repo.
Here are some sample vulnerable repos to test on:
- Django: lets-be-bad-guys, django.nV
- Flask: Vulnerable-Flask-App
- Java: WebGoat, OWASP Benchmark
- NodeJS: OWASP Juice Shop, DevSlop Pixi
- Golang: GoVWA
Writing Custom Rules
One of the strengths of Semgrep is how easy it is to write rules.
This makes it possible to:
- Quickly port rules from other tools.
- Think of an interesting code pattern, and then find instances of it in your code.
- Find code base or org-specific bugs and antipatterns - things that built-in checks for existing tools won't find because they're unique to you.
- and more!
Simple Rules
For iterating on simple patterns, you can use the --lang
and --pattern
flags.
$ semgrep --lang javascript --pattern 'eval(...)' path/to/file.js
The --lang
flag tells Semgrep which language you're targeting and --pattern
is the code pattern to search for.
Advanced Rules
Some rules need more than one line of pattern to express. Sometimes you want to express code patterns, like: X
must be true AND Y
must be too, or X
but NOT Y
, or X
must occur inside a block of code that Y
matches.
For these cases, Semgrep has a more powerful and flexible YAML syntax.
You can run a single rule or directory of rules specified in YAML by:
$ semgrep --config my_rule.yml path/to/dir_or_file
$ semgrep --config yaml_dir/ path/to/dir_or_file
Example Advanced Rule
Say you are building a financial trading application in which every Transaction
object must first be passed to verify_transaction()
before being passed to make_transaction()
, or it's a business logic bug.
You can express this behavior with the following Semgrep YAML pattern:
rules:
- id: find-unverified-transactions
patterns:
- pattern: |
public $RETURN $METHOD(...){
...
make_transaction($T);
...
}
- pattern-not: |
public $RETURN $METHOD(...){
...
verify_transaction($T);
...
make_transaction($T);
...
}
message: |
In $METHOD, there's a call to make_transaction() without first calling verify_transaction() on the Transaction object.
$RETURN
,$METHOD
, and$T
are metavariables, an abstraction that Semgrep provides when you want to match something but you don't know exactly what it is ahead of time.- You can think of metavariables like a capture group in regular expressions.
- The
pattern
clause defines what we're looking for: any method that callsmake_transaction()
. - The
pattern-not
clause filters out matches we don't want; in this case, methods where a transaction ($T
) is passed toverify_transaction()
beforemake_transaction()
. - The
message
is what's returned in Semgrep output, either to STDOUT or as a comment on the pull request on GitHub or other systems.- Note that metavariables can be used to customize messages and make them contextually relevant. Here we're helpfully telling the user the method where we've identified the bug.
You can play with this transaction example here: https://semgrep.live/4b4g.
Learn More
- See the pattern features docs for more info and examples on the flexibility and power of Semgrep patterns.
- See the YAML configuration file docs for details on all of the keys that can be used and how they work.
Run Semgrep Continously in CI
Semgrep can be run via CLI or Docker and output results as JSON (via the --json
flag), so it can be inserted into any CI pipeline and have its results processed by whatever tools you're using.
Semgrep is aware of diffs, so it can report only findings that occur in newly added code, for example, in a commit or pull request.
Currently, the easiest way to integrate Semgrep into CI is via a GitHub action we've built. See the integrations docs for more details.
Semgrep can also output results in the standardized Static Analysis Results Interchange Format (SARIF) with the --sarif
flag, if you use tools that accept this format.
Upgrading
How you upgrade Semgrep will depend on how you installed it.
From Homebrew:
$ brew upgrade semgrep
From PyPI:
$ python -m pip install --upgrade semgrep
From Docker:
$ docker pull returntocorp/semgrep:latest
Resources
Learn more:
- Semgrep presentation and slides from the Bay Area OWASP meetup.
- Check out the r2c YouTube channel for more videos.
- More detailed Semgrep docs
Get in touch:
- Submit a bug report
- Join our community Slack to say "hi" or ask questions
Contributing
Semgrep is LGPL-licensed, feel free to help out: CONTRIBUTING.
Semgrep is a frontend to a larger program analysis library named pfff
. pfff
began and was open-sourced at Facebook but is now archived. The primary maintainer now works at r2c. Semgrep was originally named sgrep
and was renamed to avoid collisons with existing projects.
Commercial Support
Semgrep is proudly supported by r2c. We're hiring!
Interested in a fully-supported, hosted version of semgrep? Drop your email and we'll ping you!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file semgrep-0.12.0.tar.gz
.
File metadata
- Download URL: semgrep-0.12.0.tar.gz
- Upload date:
- Size: 55.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18f87b1eba07997a6ab9ecf4c6c061c4b061f756502e3ab082bbf4ac04c5e3ca |
|
MD5 | 14fe80acecc9a1c2e38f8d9c7c1a29ff |
|
BLAKE2b-256 | 479ce0358eb5ca801925282f737b86ac25bbd9af8c91bef343039ad94c656ee3 |
File details
Details for the file semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl
.
File metadata
- Download URL: semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl
- Upload date:
- Size: 2.3 MB
- Tags: CPython 3.6, CPython 3.7, CPython 3.8, Python 3.6, Python 3.7, Python 3.8
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf4eab3f696c4ba33bb6a2d300a747e64ccc23338a0586d1f9961af3897a3320 |
|
MD5 | 4089843c8a4606c39a673cf6677890cd |
|
BLAKE2b-256 | d02cfd578e644a7f4cb7fe0b973d824f27aa344a630ae09f837cf043dd444a79 |
File details
Details for the file semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
.
File metadata
- Download URL: semgrep-0.12.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.6, CPython 3.7, CPython 3.8, Python 3.6, Python 3.7, Python 3.8, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79e91403ba1db64377d4e1dbbc6a2011dd7d7511806e106a4b172e5d6b5ad5c2 |
|
MD5 | 8c19d8d51547b4728b26926e60195754 |
|
BLAKE2b-256 | 37b0b38b3acf4209282cd2651d9fce84ebf4035c3c98b7e2e5b7d750e99a55f9 |