Ionosphere learning from repetitive patterns

By default learning from repetitive patterns is not enabled because you should define the metrics to exclude from this learning before enabling it.

Exclude and include

You need to decide what metrics you want to exclude from learning from repetitive patterns. You do not want Skyline learning from repetitive patterns on bad metrics, like metrics that are related to errors, 50x status codes, access_denied, etc. These types of metrics can be excluded from the learning by defining them in the settings.IONOSPHERE_REPETITIVE_PATTERNS_EXCLUDE.

For convenience sake there is a settings.IONOSPHERE_REPETITIVE_PATTERNS_INCLUDE setting which allows you to define only certain metrics to be learnt from repetitive patterns. If it is defined it is evaluated before the EXCLUDE to filter only the metrics that match definitions. The EXCLUDE (if defined) is then applied to the INCLUDE metrics list. This setting is to allow for testing repetitive learning with limited set of metrics before implementing it on the entire metric population.

For a detailed description of the data structures both of these dictionaries see the annotated example under settings.IONOSPHERE_REPETITIVE_PATTERNS_EXCLUDE in settings.py

Methods

Two methods of learning from repetitive patterns can be implemented in Ionosphere. Each method has slight differences and running both achieves the best results.

Often metrics exhibit repetitive patterns where there are large fluctuations periodically. A number of these are caused by normal operations of a process, for example something like prometheus_tsdb_compactions_total or VictoriaMetrics process_io_write_syscalls_total are processes that occur periodically and tend to generate spikes of similar and varying magnitude. There are many other types of metrics that fit into this category, things like backups which generally cause spikes in disk I/O, network bandwidth, etc, and other things like cron jobs, log rotation, etc, that run on a scheduled basis can result in these types of patterns in metrics.

The learn_repetitive_patterns processes evaluates current and previous anomalies on metrics and calculate the features profile for each of these, then comparisons are made between all the calculated features profiles and if 3 are found to be similar then these patterns are learnt as being normal behaviour.

The two methods, namely learn_repetitive_patterns and find_repetitive_patterns are described here, both are similar but each searches for a different kind of pattern.

learn_repetitive_patterns

This method learns daily periodic patterns. In new metrics these patterns can be found and learnt after 10 days. Thereafter if the pattern changes the method can learn the new daily periodic pattern after 7 days.

This method is run periodically against metrics that have training data. It is used to find metrics that exhibit daily patterns in terms of anomalies which occur during similar periods, e.g. between 00h00 and 00h15 or 03h15 and 03h30. Processes such as cron jobs, compaction, log rotation, storage bucket rolling, etc, all tend to cause this type of behaviour in metrics. The training data for each metric is evaluated and any training data that is discovered with anomalies aligned in similar periods/windows are evaluated. The evaluation compares the features profile sums of each training data set and if 3 training data sets are found to be significantly similar, these are classified as normal and they are automatically trained on.

find_repetitive_patterns

This method can learn patterns in metrics after 7 days.

This method is run periodically against all anomalies. Unlike learn_repetitive_patterns this method does not use training data or period alignment, it uses anomalies from the metric in the last 30 days. This results in finding patterns of normal behaviour that are not necessarily periodically aligned but that can occur sporadically but are normal over a longer period and happen frequently enough to be considered as normal.

After the normal Ionosphere learn window has passed, the find_repetitive_patterns process determines all anomalies which have not been trained on since the last evaluation (the first run considers only the previous 24 hours).

  • The process surfaces every anomaly that occurred on the metric in the past 30 days.

  • The time series data for each of these anomalies is then fetched and the features profile sum is calculated for each anomalous time series.

  • These feature profile sums are then compared to the features profile sums of all the other anomalies, which do not fall in same weekly period as the feature profile sum being evaluated. This means that only anomalies that are in different weeks are evaluated against each other and anomalies in the same weekly window are not considered. Ensuring that the patterns found occur frequently and are not just a representation of recent behaviour.

  • A confusion matrix of similarity is generated and if any 3 or more are found to be similar they are classified as normal and trained upon.

An implementation factor to be aware of is that due to the fact that normal MinMax scaling checks and comparisons that occur in the normal Ionosphere process based on similar ranges is not implemented here. MinMax scaling is implemented via threshold on the average value of the time series defined in settings.IONOSPHERE_REPETITIVE_PATTERNS_MINMAX_AVG_VALUE. Due to the fact that patterns and not absolutes are being looked for here and that the features profiles being compared are separated by at least 7 days, the features profile sums should be sufficient to differentiate dissimilar patterns. There is a very small possibility that the MinMax scaling will result in false positive matches, however the metric would have to change proportionally in average, peak and trough values for this to occur which is probably unlikely and even if it did, unless those changes were significant in magnitude, it is desirable that they do match as similar.

settings.IONOSPHERE_REPETITIVE_PATTERNS_MINMAX_AVG_VALUE can be set to 0 to disable MinMax scaling comparisons.