Main information about loglizer fork

Loglizer is a machine learning-based log analysis toolkit for automated anomaly detection In this fork, the main emphasis is on improving the performance of a PCA model by changing the data preprocessing - splitting the analyzed data window by different time intervals and also adding functionality to build graphs and reports. You may find additional information in the original repository.

Framework

The log analysis framework for anomaly detection usually comprises the following components:

Log collection: Logs are generated at runtime and aggregated into a centralized place with a data streaming pipeline, such as Flume and Kafka.
Log parsing: The goal of log parsing is to convert unstructured log messages into a map of structured events, based on which sophisticated machine learning models can be applied. The details of log parsing can be found at our logparser project.
Feature extraction: Structured logs can be sliced into short log sequences through interval window, sliding window, or session window. Then, feature extraction is performed to vectorize each log sequence, for example, using an event counting vector.
Anomaly detection: Anomaly detection models are trained to check whether a given feature vector is an anomaly or not.

Models

Anomaly detection unsupervised model :

| Model | Paper reference | | PCA | [SOSP'09] Large-Scale System Problems Detection by Mining Console Logs, by Wei Xu, Ling Huang, Armando Fox, David Patterson, Michael I. Jordan. [Intel] |

Log data

Log data used in the following fork. Or from original repository

Install

git clone https://github.com/logpai/loglizer.git
cd loglizer
pip install -r requirements.txt

API usage

# Load HDFS dataset. If you would like to try your own log, you need to rewrite the load function.
(x_train, _), (_, _) = dataloader.load_HDFS(...)

# Feature extraction and transformation
feature_extractor = preprocessing.FeatureExtractor()
feature_extractor.fit_transform(...)

# Model training
model = PCA()
model.fit(...)

# Feature transform after fitting
x_test = feature_extractor.transform(...)
# Model evaluation with labeled data
model.evaluate(...)

# Anomaly prediction
x_test = feature_extractor.transform(...)
model.predict(...) # predict anomalies on given data

For more details, please follow the demo in the docs to get started.

Analysis results for sample data

The graph illustrates anomalies search based on a comparison of the SPE (squared prediction error) threshold calculated for all training dataset and SPE calculated for events that occurred in the corresponding time period. If the SPE calculated at a given moment of time is greater than SPE threshold , then this is considered as an anomaly. For this data example, events are analyzed per second (alternatively analyzed time periods can be specified in the code by time_delta_sec parameter)

All events have id,template and weight counted by tf–idf method. SPE is calculated from these weights.

All found anomalies are recorded in the report

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
analysis_results/HDFS		analysis_results/HDFS
data/HDFS		data/HDFS
demo		demo
docs		docs
loglizer		loglizer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Main information about loglizer fork

Framework

Models

Log data

Install

API usage

Analysis results for sample data

About

Releases

Packages

Languages

License

nikile/loglizer

Folders and files

Latest commit

History

Repository files navigation

Main information about loglizer fork

Framework

Models

Log data

Install

API usage

Analysis results for sample data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages