Skip to main content

extract anomalies from log files

Project description

Based on success logs, logreduce highlights useful text in failed logs. The goal is to save time in finding a failure’s root cause.

On average, learning run at 1000 lines per second, and testing run at 0.800k lines per seconds.

How it works

logreduce uses a model to learn successful logs and detect novelties in failed logs:

  • Random words are manually removed using regular expression

  • Then lines are converted to a matrix of token occurrences (using HashingVectorizer),

  • An unsupervised learner implements neighbor searches (using NearestNeighbors).

Caveats

This method doesn’t work when debug content is only included in failed logs. To successfully detect anomalies, failed and success logs needs to be similar, otherwise the extra informations in failed logs will be considered anomalous.

For example this happens with testr where success logs only contains ‘SUCCESS’.

Install

  • Fedora:

sudo dnf install -y python3-scikit-learn
git clone https://softwarefactory-project.io/r/logreduce
pushd logreduce
python3 setup.py develop --user
popd
  • Pip:

pip install --user logreduce

Usage

Logreduce needs a baseline for success log training, and a target for the log to reduce.

Logreduce prints anomalies on the console, the log files are not modified:

# "%(distance)f | %(log_path)s:%(line_number)d: %(log_line)s"

$ logreduce diff testr-nodepool-01/output.good testr-nodepool-01/output.fail
[...]
0.232 | testr-nodepool-01/output.fail:0677:     File "voluptuous/schema_builder.py", line 370, in validate_mapping
0.462 | testr-nodepool-01/output.fail:0678:       raise er.MultipleInvalid(errors)
0.650 | testr-nodepool-01/output.fail:0679:   voluptuous.error.MultipleInvalid: required key not provided @ data['providers'][2]['cloud']

Examples

  • Look for new events in log files:

$ logreduce diff /var/log/audit/audit.log.4 /var/log/audit/audit.log --context-length 0
0.276 | /var/log/audit/audit.log:0606: type=USER_AUTH msg=audit(1498373150.931:1661763): pid=20252 uid=0 auid=1000 ses=19490 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:authentication grantors=pam_rootok acct="root" exe="/usr/bin/su" hostname=? addr=? terminal=pts/0 res=success'
0.287 | /var/log/audit/audit.log:0607: type=USER_ACCT msg=audit(1498373150.931:1661764): pid=20252 uid=0 auid=1000 ses=19490 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_succeed_if acct="root" exe="/usr/bin/su" hostname=? addr=? terminal=pts/0 res=success'
  • Look for anomaly in Zuul jobs:

$ logreduce logs --zuul-web http://zuul.openstack.org --project openstack-infra/zuul --threshold 0.5 --context-length 0 \
     http://logs.openstack.org/64/544964/3/check/tox-pep8/05fc35f/
0.592 | job-output.txt.gz:0518: 2018-02-22 11:22:10.888593 | ubuntu-xenial | ./tests/unit/test_merger_repo.py:81:9: F841 local variable 'remote_sha' is assigned to but never used
2018-02-28 09:17:13,886 INFO  Classifier - Testing took 0.068s at 0.767MB/s (9.962kl/s) (0.052 MB - 0.680 kilo-lines)
99.85% reduction (from 680 lines to 1)
  • Look for anomaly in Zuul jobs artifacts:

$ logreduce logs --zuul-web http://zuul.openstack.org --threshold 0.5 --include-path controller/ --exclude-file unbound_log.txt --pipeline check \
     http://logs.openstack.org/34/548134/1/check/nodepool-functional-py35/3aab684/
0.500 | job-output.txt.gz:0634: 2018-02-27 03:29:38.186475 | controller |   "ephemeral_device": "VARIABLE IS NOT DEFINED!"
0.711 | controller/logs/libvirt/libvirtd_log.txt.gz:0536:       2018-02-27 03:37:12.853+0000: 24202: debug : virPCIDeviceFindCapabilityOffset:541 : 1af4 1000 0000:00:03.0: failed to find cap 0x10
2018-02-28 09:37:13,102 INFO  Classifier - Testing took 49.910s at 0.405MB/s (2.380kl/s) (20.207 MB - 118.798 kilo-lines)
99.97% reduction (from 118798 lines to 35)

logreduce-tests

This package contains tests data for different type of log such as testr or syslog. Each tests includes a pre-computed list of the anomalies in log failures.

This package also includes a command line utility to run logreduce against all tests data and print a summary of its performance.

Test format

Each tests case is composed of:

  • A .good file (or directory) that holds the baseline

  • A .fail file (or directory)

  • A info.yaml file that describe expected output:

threshold: float # set the distance threshold for the test
anomalies:
  - optional: bool  # to define minor anomalies not considered false positive
    lines: |        # the expected lines to be highlighted
      Traceback...
      RuntimeError...

Evaluate

To run the evaluation, first install logreduce-tests:

git clone https://softwarefactory-project.io/r/logreduce-tests
pushd logreduce-tests
python3 setup.py develop --user

logreduce-tests expect tests directories as argument:

$ logreduce-tests tests/testr-zuul-[0-9]*
[testr-zuul-01]: 100.00% accuracy,  5.00% false-positive
[testr-zuul-02]:  80.00% accuracy,  0.00% false-positive
...
Summary:  90.00% accuracy,  2.50% false-positive

Add –debug to display false positive and missing chunks.

Roadmap/todo

  • Add logstash filter module

  • Add daemon worker mode with MQTT event listener

  • Add tarball traversal in utils.files_iterator

  • Improve tokenization tests

  • Discard files that are 100% anomalous

  • Report mean diviation instead of absolute distances

Contribute

Contribution are most welcome, use git-review to propose a change. Setup your ssh keys after sign in https://softwarefactory-project.io/auth/login

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logreduce-0.1.0.tar.gz (33.5 kB view details)

Uploaded Source

Built Distribution

logreduce-0.1.0-py2.py3-none-any.whl (28.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file logreduce-0.1.0.tar.gz.

File metadata

  • Download URL: logreduce-0.1.0.tar.gz
  • Upload date:
  • Size: 33.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for logreduce-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7a84dcdbd46fba68d74621e9a24d1b1447d77b2a74ac21b17eb747f6f5e7e81f
MD5 25ad067a6763d18db6ef37dbfb2b52f3
BLAKE2b-256 f776ace4201382333ecc6ea6001f2831fcfb189f0fea8629bd85318bd4ad086f

See more details on using hashes here.

File details

Details for the file logreduce-0.1.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for logreduce-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d9e9699a0d8932c0ef262e52bec6d3d9971ed3f2f51e2650dddb2fe7ab9df804
MD5 8feff8627c9eb7f1ee484dff3ea623a5
BLAKE2b-256 8b26688f7cd5c5879a5001e648a29ac7dd000704e7f1657cbdd34d4df5f9f7eb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page