SNAB

EXPERIMENTAL

SNAB is for algorithm testing and benchmarking with real time data and historic data. Allowing for custom_algorithms to be run against certain metrics in the real time workflow pipeline.

SNAB was inspired by the The Numenta Anomaly Benchmark project (https://github.com/numenta/NAB) and attempts to provide a framework within Skyline for testing and benchmark algorithms on real time data.

SNAB is not a replacement for NAB, it is simply the beginning of a NAB-ish like implementation for testing that suits Skyline.

SNAB components

SNAB uses the settings.SNAB_DATA_DIR directory (for time series data), the snab database table for results and reports to log and a slack channel (if configured too) and it needs to be able to access the algorithm/s code for the algorithm/s that are configured to be used.

settings.py

Currently the snab module has two relevant settings settings.SNAB_ENABLED and settings.SNAB_CHECKS. For Analyzer and Mirage to send anomalies to SNAB to check against other algorithms, settings.SNAB_ENABLED must be set to True.

Running Modes

SNAB has 2 modes it can run in, testing and realtime (not tested). SNAB has only be tested running with Mirage.

SNAB is only implemented as on v2.1.0 with the testing mode with Mirage with a single algorithm.

The snab process

The snab Skyline app is started like the other apps, using bin/snab.d in the repo and your choice of process manager, e.g. systemctl, etc.

In terms of running, when a Skyline app is defined in the settings.SNAB_CHECKS, when the app triggers an anomaly on a metric, it will save the time series data to the settings.SNAB_DATA_DIR directory (if the time series has not been saved for Ionosphere training data) and instruct Panorama to create a snab DB entry for the originating algorithm_group and the anomaly_id via the panorama.snab Redis set. The app will then send the metric and details through to snab to check against the defined algorithm/s.

SNAB will then load the saved time series from the file specified in the check details and run the defined algorithm against it, determine the result, instruct Panorama to also create a snab DB entry for the algorithm, anomaly_id and algorithm_run_time and snab will post the result to slack for evaluation.

The operator can then evaluate the results for the originating anomaly and algorithm_group and the SNAB algorithm/s in slack and assign a tP (true positive), fP (false positive), tN (true negative), fN (false negative) or unsure value to the result of each. This provides data in the snab database table with which to evaluate the performance of different algorithms.

The use of slack is required for evaluation, evaluation from the snab.log and manual creation of the webapp API snab endpoint URLs is not really feasible. The slack alerts provide all the graphs and data necessary for a proper evaluation to be made.

SNAB_CHECKS

To send any metrics to SNAB for them to be checked they must be defined in an item in the settings.SNAB_CHECKS dictionary and settings.SNAB_ENABLED must be set to True. The settings.SNAB_CHECKS dictionary has the following structure:

SNAB_CHECKS = {
    '<skyline_app>': {
        '<mode>': {
            '<algorithm>': {
                'namespaces': [''<metric_namespace_1>', '<metric_namespace_2>'],
                'algorithm_source': '<absolute_path and filename>',
                'algorithm_parameters': {'<algorithm_parameter_1>': <value>, '<algorithm_parameter_2>': <value>},
                'max_execution_time': <seconds|float>,
                'debug_logging': <boolean>,
                'alert_slack_channel': '<slack_channel>'
            }
        }
    },
}

An example of this would be:

SNAB_CHECKS = {
    'mirage': {
        'testing': {
            'skyline_matrixprofile': {
                'namespaces': ['telegraf'],
                'algorithm_source': '/opt/skyline/github/skyline/skyline/custom_algorithms/skyline_matrixprofile.py',
                'algorithm_parameters': {'windows': 5, 'k_discords': 20},
                'max_execution_time': 10.0,
                'debug_logging': True,
                'alert_slack_channel': '#skyline'
            }
        }
    },
}

SNAB_LOAD_TEST_ANALYZER

SNAB can be used to load test Analyzer by defining the number of metrics you want to load test with is settings.SNAB_LOAD_TEST_ANALYZER. The load testing is not an exact reflection of the capability of Analyzer but rather an indication of the possible capability of Analyzer. This is becasue of how the load test is run. The load testing is only run after normal analysis and the metrics run through the load test are not submitted to any of the normal classification that metrics run through the normal process are subjected to. Metrics run through the normal analysis process are subjected to any classification checks that may be done on metrics, e.g.

  • Is this a mirage metric?
  • Is this a strictly increasing monotonic metric? Should the derivative be used?
  • Should airgaps be identified in this metric?
  • Is this a flux filled metric?
  • etc

These classifications and transformations may be done on some metrics during the normal analysis process and each takes some time. However these classifications and transformations are not desired in the load testing because during load testing, the metric data being used to load test should not impact or modify any of the normal analysis process resources in terms of Redis sets and keys, etc.

The load testing uses the same time series data of the metrics that run through the normal analysis process by iterating this data through the three-sigma algorithms after normal analysis until:

Load test results are reported in the analyzer.log

Running a load testing is as easy as defining a number in settings.SNAB_LOAD_TEST_ANALYZER, restarting analyzer and checking the log. Remember to set settings.SNAB_LOAD_TEST_ANALYZER to 0 after your load test and restart analyzer again.

Example of log output:

SNAB flux load tester

So how many metrics can Skyline handle running on that digitalocean droplet?

It is difficult to tell how many metrics a single Skyline server can handle given the myriad combinations of configurations and hardware resources it may be run on.

This is where SNAB flux load tester comes into play. It allows you to deploy Skyline and the snab app can be configured to send as many metrics as you want to Graphite and Skyline.

Warning

DO NOT run this on an existing Skyline that is running with real data, it is meant to be run on a new, disposal Skyline build. This because it populates Graphite, Redis, the database tables, etc with real data of test metrics. Unless you want to test and remove all the test metrics from Redis, Graphite and MariaDB manually, which would be possible, but not advisable.

To enable a snab_flux_load_test set the following in settings.py:

SNAB_FLUX_LOAD_TEST_ENABLED = True
# the number of metrics you want to load test with, use the number appropriate
# for you
SNAB_FLUX_LOAD_TEST_METRICS = 2000

Ensure that horizon, analyzer, flux and Graphite are running and start snab_flux_load_test.d as appropriate

sudo -u skyline /opt/skyline/github/skyline/bin/snab_flux_load_test.d start
tail -n 80 /var/log/skyline/snab_flux_load_test.log

# Stop the load test
/opt/skyline/github/skyline/bin/snab_flux_load_test.d stop

You will want to let the load test run for a while and you may want to adjust the settings.SNAB_FLUX_LOAD_TEST_METRICS value and restart the test a few times.