extract anomaly from log files
Project description
Based on success logs, logreduce highlights useful text in failed logs. The goal is to assist in debugging failure and reduce effort needed to read boring log files.
On average, learning run at 1000 lines per second, and testing run at 0.800k lines per seconds.
How it works
logreduce uses a model to learn successful logs and detect outliers in failed log that doesn’t fit the model. The model is constructed as follow:
Random words are manually removed using regular expression
Striped lines are converted into a dictionary based numeric vector (using CountVectorizer),
Vector are weighted based on term frequencies times inverse document-frequency (using TfidfTransformer),
Then vector are used in a unsupervised nearest neighbor learning model.
There are currently two model:
simple, using NearestNeighbors
lsfh, using LSHForest
In short, logreduce relies heavily on line stripping with a bag-of-words technique and it uses the distance to known sentences to detect outliers.
For example this input:
2017-06-21 04:37:45,827 INFO [nodepool.builder.UploadWorker.0] Uploading DIB image build 0000000002 from /tmpxvLOTg/fake-image-0000000002.qcow2 to fake-provider
Results in:
INFO nodepool builder UploadWorker Uploading image build from /fake image fake provider
The tokenization makes the model a bit dependent on the target data, for example, to support OpenStack logs, words begining by ns- or req- are taken into account. Further improvement such as characters n-gram may remove such limitation.
Caveats
This method doesn’t work when debug content is only included in failed logs. To successfully detect anomalies, failed and success logs needs to be similar, otherwise all the extra information in failed logs will be considered anomalous.
This is currently the case for tripleo ovb ci where overcloud logs are only included in the failed logs, resulting in a lot of false-positive.
This also happens with testr tests where success logs only contains ‘SUCCESS’.
Install
Fedora:
sudo dnf install -y python3-scikit-learn
git clone https://softwarefactory-project.io/r/logreduce
pushd logreduce
sudo python3 setup.py develop
popd
Pip:
pip install --user logreduce
Fetch bootstrap for nicer html output
curl -O https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css
curl -O https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js
Usage
Log files can be a single file or a directory.
Logreduce needs a –baseline for success log training, and a target for the log to reduce.
Logreduce will print anomalies on the console, the log files are not modified. When using the text –output-format, anomalies are printed using this format:
# "%(distance)f | %(log_path)s:%(line_number)d: %(log_line)s"
$ logreduce --baseline testr-nodepool-01/output.good testr-nodepool-01/output.fail
[...]
0.232 | testr-nodepool-01/output.fail:0677: File "voluptuous/schema_builder.py", line 370, in validate_mapping
0.462 | testr-nodepool-01/output.fail:0678: raise er.MultipleInvalid(errors)
0.650 | testr-nodepool-01/output.fail:0679: voluptuous.error.MultipleInvalid: required key not provided @ data['providers'][2]['cloud']
The model can be trained and saved for re-use using –save. When using –load logreduce doesn’t need a –baseline.
Full usage:
$ logreduce --help
usage: logreduce [-h] [--debug] [--debug-token]
[--ignore-file IGNORE_FILE [IGNORE_FILE ...]]
[--model {simple,lshf,noop}] [--html HTML] [--json JSON]
[--save FILE] [--load FILE] [--threshold THRESHOLD]
[--merge-distance MERGE_DISTANCE]
[--before-context BEFORE_CONTEXT]
[--after-context AFTER_CONTEXT]
[--context-length CONTEXT_LENGTH] [--baseline LOG]
[target [target ...]]
positional arguments:
target Failed logs
optional arguments:
-h, --help show this help message and exit
--debug Print debug
--debug-token Print tokenization process
--ignore-file IGNORE_FILE [IGNORE_FILE ...]
--model {simple,lshf,noop}
--html FILE Render html result
--json FILE Render json result
--save FILE Save the model
--load FILE Load a previous model
--threshold THRESHOLD
Outlier distance threshold, set to 0.0 to display all
log, 1.0 to only display clear anomalies
--merge-distance MERGE_DISTANCE
Distance between chunks to merge in a continuous one
--before-context BEFORE_CONTEXT
Amount of lines to include before the anomaly
--after-context AFTER_CONTEXT
Amount of lines to include after the anomaly
--context-length CONTEXT_LENGTH
Set both after and before size
--baseline LOG Success logs
See bellow for some examples
Examples
Look for new events in log files:
$ logreduce --baseline /var/log/audit/audit.log.4 /var/log/audit/audit.log --context-length 0
0.276 | /var/log/audit/audit.log:0606: type=USER_AUTH msg=audit(1498373150.931:1661763): pid=20252 uid=0 auid=1000 ses=19490 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:authentication grantors=pam_rootok acct="root" exe="/usr/bin/su" hostname=? addr=? terminal=pts/0 res=success'
0.287 | /var/log/audit/audit.log:0607: type=USER_ACCT msg=audit(1498373150.931:1661764): pid=20252 uid=0 auid=1000 ses=19490 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_succeed_if acct="root" exe="/usr/bin/su" hostname=? addr=? terminal=pts/0 res=success'
Re-using a model:
$ logreduce --baseline /var/log/audit/audit.log.4 --save ~/audit.model
$ logreduce --load ~/audit.model /var/log/audit/audit.log
logreduce-tests
This package contains tests data for different type of log such as testr or syslog. Each tests includes a pre-computed list of the anomalies in log failures.
This package also includes a command line utility to run logreduce against all tests data and print a summary of its performance.
Test format
Each tests case is composed of:
A .good file (or directory) that holds the baseline
A .fail file (or directory)
A info.yaml file that describe expected output:
threshold: float # set the distance threshold for the test
anomalies:
- optional: bool # to define minor anomalies not considered false positive
lines: | # the expected lines to be highlighted
Traceback...
RuntimeError...
Evaluate
To run the evaluation, first install logreduce-tests:
git clone https://softwarefactory-project.io/r/logreduce-tests
pushd logreduce-tests
sudo python3 setup.py develop
logreduce-tests expect tests directories as argument:
$ logreduce-tests tests/testr-zuul-[0-9]*
[testr-zuul-01]: 100.00% accuracy, 5.00% false-positive
[testr-zuul-02]: 80.00% accuracy, 0.00% false-positive
...
Summary: 90.00% accuracy, 2.50% false-positive
Add –debug to display false positive and missing chunks.
Roadmap/todo
Support automatic log analysis and reporting when a job failed, e.g. through jenkins publisher or zuul post jobs.
Add tarball traversal in utils.files_iterator
Improve tokenization tests
Discard files that are 100% anomalous
Run logreduce-test in paralelle
Use model bin to group line per size, e.g. <8 tokens, 8~32 tokens, >32 tokens
Other ideas:
Compare logreduce performance between two versions, perhaps using logreduce itself… logception!
Find an alternative to lshf, the model currently spend 97% of the time in the lsh.kneighbors method…
Investigate character n-gram instead of word vectorisation
Investigate more advance model such as recurrent neural net, perhaps using tensorflow instead of scikit-learn
Investigate learning failed logs to reduce common/useless failure expression
Contribute
Contribution are most welcome, use git-review to propose a change. Setup your ssh keys after sign in https://softwarefactory-project.io/auth/login
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file logreduce-0.0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: logreduce-0.0.1-py2.py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d734e27f4bb4359377d648e1113e5c89f1b517bb3d1310dbf38183402e1715c |
|
MD5 | 6ee4588e7c42ea50cc4944272f2e165a |
|
BLAKE2b-256 | 17a2d9908bad5fa277cac726476f7b132f4490ace22996c58d8166a4ac656011 |