skyline.custom_algorithms package

Submodules

skyline.custom_algorithms.abs_stddev_from_median module

THIS IS A SAMPLE CUSTOM ALGORITHM to provide a skeleton to develop your own custom algorithms. The algorithm itself, although viable, is not recommended for production or general use, it is simply a toy algorithm here to demonstrate the structure of a simple custom algorithm that has algorithm_parameters passed as an empty dict {}. It is documented via comments #

abs_stddev_from_median(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

A data point is anomalous if its absolute value is greater than three standard deviations of the median.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]

  • algorithm_parameters (dict) – a dictionary of any parameters and their arguments you wish to pass to the algorithm.

Returns:

True, False or Non

Return type:

boolean

skyline.custom_algorithms.adtk_level_shift module

adtk_level_shift(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

A timeseries is anomalous if a level shift occurs in a 5 window period bound by a factor of 9 of the normal range based on historical interquartile range.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]

  • algorithm_parameters (dict) –

    a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the matrixprofile custom algorithm the following parameters are required, example: ``algorithm_parameters={

    ’c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5

    }``

Returns:

True, False or Non

Return type:

boolean

Performance is of paramount importance in Skyline, especially in terms of computational complexity, along with execution time and CPU usage. The adtk LevelShiftAD algortihm is not O(n) and it is not fast either, not when compared to the normal three-sigma triggered algorithms. However it is useful if you care about detecting all level shifts. The normal three-sigma triggered algorithms do not always detect a level shift, especially if the level shift does not breach the three-sigma limits. Therefore you may find over time that you encounter alerts that contain level shifts that you thought should have been detected. On these types of metrics and events, the adtk LevelShiftAD algortihm can be implemented to detect and alert on these. It is not recommended to run on all your metrics as it would immediately triple the analyzer runtime every if only run every 5 windows/ minutes.

Due to the computational complexity and long run time of the adtk LevelShiftAD algorithm on the size of timeseries data used by Skyline, if you consider the following timings of all three-sigma triggered algorithms and compare them to the to the adtk_level_shift results in the last 2 rows of the below log, it is clear that the running adtk_level_shift on all metrics is probably not desirable, even if it is possible to do, it is very noisy.

2021-03-06 10:46:38 :: 1582754 :: algorithm run count - histogram_bins run 567 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - histogram_bins has 567 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - histogram_bins - total: 1.051136 - median: 0.001430 2021-03-06 10:46:38 :: 1582754 :: algorithm run count - first_hour_average run 567 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - first_hour_average has 567 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - first_hour_average - total: 1.322432 - median: 0.001835 2021-03-06 10:46:38 :: 1582754 :: algorithm run count - stddev_from_average run 567 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - stddev_from_average has 567 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - stddev_from_average - total: 1.097290 - median: 0.001641 2021-03-06 10:46:38 :: 1582754 :: algorithm run count - grubbs run 567 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - grubbs has 567 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - grubbs - total: 1.742929 - median: 0.002438 2021-03-06 10:46:38 :: 1582754 :: algorithm run count - ks_test run 147 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - ks_test has 147 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - ks_test - total: 0.127648 - median: 0.000529 2021-03-06 10:46:38 :: 1582754 :: algorithm run count - mean_subtraction_cumulation run 40 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - mean_subtraction_cumulation has 40 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - mean_subtraction_cumulation - total: 0.152515 - median: 0.003152 2021-03-06 10:46:39 :: 1582754 :: algorithm run count - median_absolute_deviation run 35 times 2021-03-06 10:46:39 :: 1582754 :: algorithm timings count - median_absolute_deviation has 35 timings 2021-03-06 10:46:39 :: 1582754 :: algorithm timing - median_absolute_deviation - total: 0.143770 - median: 0.003248 2021-03-06 10:46:39 :: 1582754 :: algorithm run count - stddev_from_moving_average run 30 times 2021-03-06 10:46:39 :: 1582754 :: algorithm timings count - stddev_from_moving_average has 30 timings 2021-03-06 10:46:39 :: 1582754 :: algorithm timing - stddev_from_moving_average - total: 0.125173 - median: 0.003092 2021-03-06 10:46:39 :: 1582754 :: algorithm run count - least_squares run 16 times 2021-03-06 10:46:39 :: 1582754 :: algorithm timings count - least_squares has 16 timings 2021-03-06 10:46:39 :: 1582754 :: algorithm timing - least_squares - total: 0.089108 - median: 0.005538 2021-03-06 10:46:39 :: 1582754 :: algorithm run count - abs_stddev_from_median run 1 times 2021-03-06 10:46:39 :: 1582754 :: algorithm timings count - abs_stddev_from_median has 1 timings 2021-03-06 10:46:39 :: 1582754 :: algorithm timing - abs_stddev_from_median - total: 0.036797 - median: 0.036797 2021-03-06 10:46:39 :: 1582754 :: algorithm run count - adtk_level_shift run 271 times 2021-03-06 10:46:39 :: 1582754 :: algorithm timings count - adtk_level_shift has 271 timings 2021-03-06 10:46:39 :: 1582754 :: algorithm timing - adtk_level_shift - total: 13.729565 - median: 0.035791 … … 2021-03-06 10:46:39 :: 1582754 :: seconds to run :: 27.93 # THE TOTAL ANALYZER RUNTIME

Therefore the analysis methodology implemented for the adtk_level_shift custom_algorithm is as folows:

  • When new metrics are added either to the configuration or by actual new

metrics coming online that match the algorithm_parameters['namespace'], Skyline implements sharding on new metrics into time slots to prevent a thundering herd situation from developing. A newly added metrics will eventually be assigned into a time shard and be added and the last analysed timestamp will be added to the analyzer.last.adtk_level_shift Redis hash key to determine the next scheduled run with algorithm_parameters['namespace']

  • A run_every parameter is implemented so that the algorithm can be

configured to run on a metric once every run_every minutes. The default is to run it every 5 minutes using window 5 (rolling) and trigger as anomalous if the algorithm labels any of the last 5 datapoints as anomalous. This means that there could be up to a 5 minute delay on an alert on the 60 second, 168 SECOND_ORDER_RESOLUTION_HOURS metrics in the example, but a c=9.0 level shift would be detected and would be alerted on (if both analyzer and mirage triggered on it). This periodic running of the algorithm is a tradeoff so that the adtk_level_shift load and runtime can be spread over run_every minutes.

  • The algorithm is not run against metrics that are sparsely populated.

When the algorithm is run on sparsely populated metrics it results in lots of false positives and noise.

The Skyline CUSTOM_ALGORITHMS implementation of the adtk LevelShiftAD algorithm is configured as the example shown below. However please note that the algorithm_parameters shown in this example configuration are suitiable for metrics that have a 60 second relation and have a settings.ALERTS Mirage SECOND_ORDER_RESOLUTION_HOURS of 168 (7 days). For metrics with a different resolution/frequency may require different values appropriate for metric resolution.

Example CUSTOM_ALGORITHMS configuration:

‘adtk_level_shift’: {
‘namespaces’: [

‘skyline.analyzer.run_time’, ‘skyline.analyzer.total_metrics’, ‘skyline.analyzer.exceptions’

], ‘algorithm_source’: ‘/opt/skyline/github/skyline/skyline/custom_algorithms/adtk_level_shift.py’, ‘algorithm_parameters’: {‘c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5}, ‘max_execution_time’: 0.5, ‘consensus’: 1, ‘algorithms_allowed_in_consensus’: [‘adtk_level_shift’], ‘run_3sigma_algorithms’: True, ‘run_before_3sigma’: True, ‘run_only_if_consensus’: False, ‘use_with’: [“analyzer”, “mirage”], ‘debug_logging’: False,

},

skyline.custom_algorithms.adtk_persist module

adtk_persist(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

A timeseries is anomalous if a level shift occurs in a 5 window period bound by a factor of 9 of the normal range based on historical interquartile range.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]

  • algorithm_parameters (dict) –

    a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the matrixprofile custom algorithm the following parameters are required, example: ``algorithm_parameters={

    ’c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5

    }``

Returns:

True, False or Non

Return type:

boolean

Performance is of paramount importance in Skyline, especially in terms of computational complexity, along with execution time and CPU usage. The adtk persistAD algortihm is not O(n) and it is not fast either, not when compared to the normal three-sigma triggered algorithms. However it is useful if you care about detecting all level shifts. The normal three-sigma triggered algorithms do not always detect a level shift, especially if the level shift does not breach the three-sigma limits. Therefore you may find over time that you encounter alerts that contain level shifts that you thought should have been detected. On these types of metrics and events, the adtk PersistAD algortihm can be implemented to detect and alert on these. It is not recommended to run on all your metrics as it would immediately triple the analyzer runtime every if only run every 5 windows/ minutes.

Due to the computational complexity and long run time of the adtk PersistAD algorithm on the size of timeseries data used by Skyline. Therefore the analysis methodology implemented for the adtk_persist custom_algorithm is as folows:

  • When new metrics are added either to the configuration or by actual new

metrics coming online that match the algorithm_parameters['namespace'], Skyline implements sharding on new metrics into time slots to prevent a thundering herd situation from developing. A newly added metrics will eventually be assigned into a time shard and be added and the last analysed timestamp will be added to the analyzer.last.adtk_persist Redis hash key to determine the next scheduled run with algorithm_parameters['namespace']

  • A run_every parameter is implemented so that the algorithm can be

configured to run on a metric once every run_every minutes. The default is to run it every 5 minutes using window 5 (rolling) and trigger as anomalous if the algorithm labels any of the last 5 datapoints as anomalous. This means that there could be up to a 5 minute delay on an alert on the 60 second, 168 SECOND_ORDER_RESOLUTION_HOURS metrics in the example, but a c=9.0 level shift would be detected and would be alerted on (if both analyzer and mirage triggered on it). This periodic running of the algorithm is a tradeoff so that the adtk_persist load and runtime can be spread over run_every minutes.

  • The algorithm is not run against metrics that are sparsely populated.

When the algorithm is run on sparsely populated metrics it results in lots of false positives and noise.

The Skyline CUSTOM_ALGORITHMS implementation of the adtk PersistAD algorithm is configured as the example shown below. However please note that the algorithm_parameters shown in this example configuration are suitiable for metrics that have a 60 second relation and have a settings.ALERTS Mirage SECOND_ORDER_RESOLUTION_HOURS of 168 (7 days). For metrics with a different resolution/frequency may require different values appropriate for metric resolution.

: Example CUSTOM_ALGORITHMS configuration:

‘adtk_persist’: {
‘namespaces’: [

‘skyline.analyzer.run_time’, ‘skyline.analyzer.total_metrics’, ‘skyline.analyzer.exceptions’

], ‘algorithm_source’: ‘/opt/skyline/github/skyline/skyline/custom_algorithms/adtk_persist.py’, ‘algorithm_parameters’: {‘c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5}, ‘max_execution_time’: 0.5, ‘consensus’: 1, ‘algorithms_allowed_in_consensus’: [‘adtk_persist’], ‘run_3sigma_algorithms’: True, ‘run_before_3sigma’: True, ‘run_only_if_consensus’: False, ‘use_with’: [“analyzer”, “mirage”], ‘debug_logging’: False,

},

skyline.custom_algorithms.adtk_seasonal module

adtk_seasonal(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

A timeseries is anomalous if a level shift occurs in a 5 window period bound by a factor of 9 of the normal range based on historical interquartile range.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]

  • algorithm_parameters (dict) –

    a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the matrixprofile custom algorithm the following parameters are required, example: ``algorithm_parameters={

    ’c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’,

    }``

Returns:

True, False or Non

Return type:

boolean

Performance is of paramount importance in Skyline, especially in terms of computational complexity, along with execution time and CPU usage. The adtk SeasonalAD algortihm is not O(n) and it is not fast either, not when compared to the normal three-sigma triggered algorithms. However it is useful if you care about detecting all level shifts. The normal three-sigma triggered algorithms do not always detect a level shift, especially if the level shift does not breach the three-sigma limits. Therefore you may find over time that you encounter alerts that contain level shifts that you thought should have been detected. On these types of metrics and events, the adtk SeasonalAD algortihm can be implemented to detect and alert on these. It is not recommended to run on all your metrics as it would immediately triple the analyzer runtime every if only run every 5 windows/ minutes.

Due to the computational complexity and long run time of the adtk SeasonalAD algorithm on the size of timeseries data used by Skyline. Therefore the analysis methodology implemented for the adtk_seasonal custom_algorithm is as folows:

  • When new metrics are added either to the configuration or by actual new

metrics coming online that match the algorithm_parameters['namespace'], Skyline implements sharding on new metrics into time slots to prevent a thundering herd situation from developing. A newly added metrics will eventually be assigned into a time shard and be added and the last analysed timestamp will be added to the analyzer.last.adtk_seasonal Redis hash key to determine the next scheduled run with algorithm_parameters['namespace']

  • A run_every parameter is implemented so that the algorithm can be

configured to run on a metric once every run_every minutes. The default is to run it every 5 minutes using window 5 (rolling) and trigger as anomalous if the algorithm labels any of the last 5 datapoints as anomalous. This means that there could be up to a 5 minute delay on an alert on the 60 second, 168 SECOND_ORDER_RESOLUTION_HOURS metrics in the example, but a c=9.0 level shift would be detected and would be alerted on (if both analyzer and mirage triggered on it). This periodic running of the algorithm is a tradeoff so that the adtk_seasonal load and runtime can be spread over run_every minutes.

  • The algorithm is not run against metrics that are sparsely populated.

When the algorithm is run on sparsely populated metrics it results in lots of false positives and noise.

The Skyline CUSTOM_ALGORITHMS implementation of the adtk SeasonalAD algorithm is configured as the example shown below. However please note that the algorithm_parameters shown in this example configuration are suitiable for metrics that have a 60 second relation and have a settings.ALERTS Mirage SECOND_ORDER_RESOLUTION_HOURS of 168 (7 days). For metrics with a different resolution/frequency may require different values appropriate for metric resolution.

: Example CUSTOM_ALGORITHMS configuration:

‘adtk_seasonal’: {
‘namespaces’: [

‘skyline.analyzer.run_time’, ‘skyline.analyzer.total_metrics’, ‘skyline.analyzer.exceptions’

], ‘algorithm_source’: ‘/opt/skyline/github/skyline/skyline/custom_algorithms/adtk_seasonal.py’, ‘algorithm_parameters’: {‘c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’}, ‘max_execution_time’: 0.5, ‘consensus’: 1, ‘algorithms_allowed_in_consensus’: [‘adtk_seasonal’], ‘run_3sigma_algorithms’: True, ‘run_before_3sigma’: True, ‘run_only_if_consensus’: False, ‘use_with’: [“analyzer”, “mirage”], ‘debug_logging’: False,

},

skyline.custom_algorithms.adtk_volatility_shift module

adtk_volatility_shift(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

A timeseries is anomalous if a level shift occurs in a 5 window period bound by a factor of 9 of the normal range based on historical interquartile range.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]

  • algorithm_parameters (dict) –

    a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the matrixprofile custom algorithm the following parameters are required, example: ``algorithm_parameters={

    ’c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5

    }``

Returns:

True, False or Non

Return type:

boolean

Performance is of paramount importance in Skyline, especially in terms of computational complexity, along with execution time and CPU usage. The adtk VolatilityShiftAD algortihm is not O(n) and it is not fast either, not when compared to the normal three-sigma triggered algorithms. However it is useful if you care about detecting all level shifts. The normal three-sigma triggered algorithms do not always detect a level shift, especially if the level shift does not breach the three-sigma limits. Therefore you may find over time that you encounter alerts that contain level shifts that you thought should have been detected. On these types of metrics and events, the adtk VolatilityShiftAD algortihm can be implemented to detect and alert on these. It is not recommended to run on all your metrics as it would immediately triple the analyzer runtime every if only run every 5 windows/ minutes.

Due to the computational complexity and long run time of the adtk VolatilityShiftAD algorithm on the size of timeseries data used by Skyline. Therefore the analysis methodology implemented for the adtk_volatility_shift custom_algorithm is as folows:

  • When new metrics are added either to the configuration or by actual new

metrics coming online that match the algorithm_parameters['namespace'], Skyline implements sharding on new metrics into time slots to prevent a thundering herd situation from developing. A newly added metrics will eventually be assigned into a time shard and be added and the last analysed timestamp will be added to the analyzer.last.adtk_volatility_shift Redis hash key to determine the next scheduled run with algorithm_parameters['namespace']

  • A run_every parameter is implemented so that the algorithm can be

configured to run on a metric once every run_every minutes. The default is to run it every 5 minutes using window 5 (rolling) and trigger as anomalous if the algorithm labels any of the last 5 datapoints as anomalous. This means that there could be up to a 5 minute delay on an alert on the 60 second, 168 SECOND_ORDER_RESOLUTION_HOURS metrics in the example, but a c=9.0 level shift would be detected and would be alerted on (if both analyzer and mirage triggered on it). This periodic running of the algorithm is a tradeoff so that the adtk_volatility_shift load and runtime can be spread over run_every minutes.

  • The algorithm is not run against metrics that are sparsely populated.

When the algorithm is run on sparsely populated metrics it results in lots of false positives and noise.

The Skyline CUSTOM_ALGORITHMS implementation of the adtk VolatilityShiftAD algorithm is configured as the example shown below. However please note that the algorithm_parameters shown in this example configuration are suitiable for metrics that have a 60 second relation and have a settings.ALERTS Mirage SECOND_ORDER_RESOLUTION_HOURS of 168 (7 days). For metrics with a different resolution/frequency may require different values appropriate for metric resolution.

: Example CUSTOM_ALGORITHMS configuration:

‘adtk_volatility_shift’: {
‘namespaces’: [

‘skyline.analyzer.run_time’, ‘skyline.analyzer.total_metrics’, ‘skyline.analyzer.exceptions’

], ‘algorithm_source’: ‘/opt/skyline/github/skyline/skyline/custom_algorithms/adtk_volatility_shift.py’, ‘algorithm_parameters’: {‘c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5}, ‘max_execution_time’: 0.5, ‘consensus’: 1, ‘algorithms_allowed_in_consensus’: [‘adtk_volatility_shift’], ‘run_3sigma_algorithms’: True, ‘run_before_3sigma’: True, ‘run_only_if_consensus’: False, ‘use_with’: [“analyzer”, “mirage”], ‘debug_logging’: False,

},

skyline.custom_algorithms.anomalous_daily_peak module

anomalous_daily_peak.py

anomalous_daily_peak(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

A 7 day timeseries is NOT anomalous if the last datapoints are in a peak whose values are within 3 standard deviations of the mean of x number of similar peaks (+/- 20%) that occurred in windows that are 24 hours apart.

Only timeseries that are thought to be anomalous should be run through this algorithm. It is meant to be run in Mirage after all algorithms, including custom algorithms have been run and found a metric to be ANOMALOUS. This algorithm does a final check to see in the anomaly is repetetive daily pattern which occurs more than x times a week.

Be aware that this algorithm will identify ANY datapoint that is NOT in a repetitive peak and within 3-sigma (+/- 20%) of the other peaks as ANOMALOUS.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) – a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the anomalous_daily_peak custom algorithm no specific algorithm_parameters are required apart from an empty dict, example: algorithm_parameters={}. But the number_of_daily_peaks can be passed define how many peaks must exist in the window period to be classed as normal. If this is set to 3 and say that we are checking a possible anomaly at 00:05, there need to be 3 peaks that occur over the past 7 days in the dialy 23:35 to 00:05 window if there are not at least 3 then this is considered as anomalous. algorithm_parameters={'number_of_daily_peaks': 3}

Returns:

anomalous, anomalyScore

Return type:

tuple(boolean, float)

skyline.custom_algorithms.dbscan module

dbscan.py

dbscan(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

Outlier detector based on DBSCAN. EXPERIMENTAL

UNRELIABLE as it is very sensitive to input parameters which make it difficult to automatically determine suitable parameters. Automatically determined parameters can sometimes be very effective, but often they do not have the desired results. Seeing as there is a single epsilon value for all clusters the algorithm fails when varying density clusters are present in the data.

Therefore if DBSCAN identifies more that 33% of the data points in a timeseries as outliers, this algorithm will return an inconclusive results.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) – a dictionary of any required parameters for the custom_algorithm and algorithm itself. Example: algorithm_parameters={'window_shape'=3, 'min_samples'=4, 'anomaly_window'=5, 'return_results'=True}

Returns:

anomalous, anomalyScore, instance_scores

Return type:

tuple(boolean, float, instance_scores)

skyline.custom_algorithms.irregular_unstable module

irregular_unstable.py

irregular_unstable(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

A timeseries is NOT anomalous if it has low variance over 30 days and does not trigger multiple algorithms.

Only timeseries that are thought to be anomalous AND have a low variance at 7 days should be run through this algorithm. It is meant to be run with Mirage after all algorithms, inlcuding custom algorithms have been run and found a metric to be ANOMALOUS. This algorithm does a final check to see if the metric has low variance and if so analyses the data at 30 days.

On irregular, unstable metrics that exhibit low variance at 7 days and 30 days, the algorithm generally results in ~63% of anomalies on these timeseries at 7 days, being correctly identified as false positives when the data is analysed at 30 days.

The irregular_unstable algorithm takes on average 3.861 seconds to run, however if the timeseries is discarded because at 30 days it does not have low variance, the average discard time is 0.213 seconds.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) – a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the irregular_unstable custom algorithm no specific algorithm_parameters are required apart from an empty dict, example: algorithm_parameters={}. But the number_of_daily_peaks can be passed define how many peaks must exist in the window period to be classed as normal. If this is set to 3 and say that we are checking a possible anomaly at 00:05, there need to be 3 peaks that occur over the past 7 days in the dialy 23:35 to 00:05 window if there are not at least 3 then this is considered as anomalous. algorithm_parameters={'number_of_daily_peaks': 3}

Returns:

anomalous, anomalyScore

Return type:

tuple(boolean, float)

skyline.custom_algorithms.isolation_forest module

isolation_forest.py

isolation_forest(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

Outlier detector based on Isolation Forest.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) – a dictionary of any required parameters for the custom_algorithm and algorithm itself. Example: algorithm_parameters={'contamination'='0.01', 'anomaly_window'=5, 'return_results'=True}

Returns:

anomalous, anomalyScore, isolation_forest_scores

Return type:

tuple(boolean, float, isolation_forest_scores)

skyline.custom_algorithms.last_same_hours module

THIS IS A MORE FEATUREFUL CUSTOM ALGORITHM to provide a skeleton to develop your own custom algorithms. The algorithm itself, although viable, is not recommended for production or general use, it is simply a toy algorithm here to demonstrate the structure of a more complex custom algorithm that has algorithm_parameters passed and can also log if enabled. It is documented via comments #

last_same_hours(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

The last_same_hours algorithm determines the data points for the same hour and minute as the current timestamp from the last x days and calculates the mean of those values and determines whether the current data point is within 3 standard deviations of the mean.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]

  • algorithm_parameters (dict) – a dictionary of any parameters and their arguments you wish to pass to the algorithm.

Returns:

True, False or Non

Return type:

boolean

skyline.custom_algorithms.lof module

lof.py

lof(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

Outlier detector for time-series data using the spectral residual algorithm. Based on the alibi-detect implementation of “Time-Series Anomaly Detection Service at Microsoft” (Ren et al., 2019) https://arxiv.org/abs/1906.03821

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) –

    a dictionary of any required parameters for the custom_algorithm and algorithm itself. Example: “algorithm_parameters”: {

    ”n_neighbors”: 2, “anomaly_window”: 5, “return_results”: True,

    }

Returns:

anomalous, anomalyScore, results

Return type:

tuple(boolean, float, dict)

skyline.custom_algorithms.longest_zero_streak module

skyline.custom_algorithms.m66 module

m66(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

A time series data points are anomalous if the 6th median is 6 standard deviations (six-sigma) from the time series 6th median standard deviation and persists for x_windows, where x_windows = int(window / 2). This algorithm finds SIGNIFICANT cahngepoints in a time series, similar to PELT and Bayesian Online Changepoint Detection, however it is more robust to instaneous outliers and more conditionally selective of changepoints.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]

  • algorithm_parameters (dict) –

    a dictionary of any required parameters for the custom_algorithm and algorithm itself for example: ``algorithm_parameters={

    ’nth_median’: 6, ‘sigma’: 6, ‘window’: 5, ‘return_anomalies’ = True,

    }``

Returns:

True, False or Non

Return type:

boolean

Example CUSTOM_ALGORITHMS configuration:

‘m66’: {
‘namespaces’: [

‘skyline.analyzer.run_time’, ‘skyline.analyzer.total_metrics’, ‘skyline.analyzer.exceptions’

], ‘algorithm_source’: ‘/opt/skyline/github/skyline/skyline/custom_algorithms/m66.py’, ‘algorithm_parameters’: {

‘nth_median’: 6, ‘sigma’: 6, ‘window’: 5, ‘resolution’: 60, ‘minimum_sparsity’: 0, ‘determine_duration’: False, ‘return_anomalies’: True, ‘save_plots_to’: False, ‘save_plots_to_absolute_dir’: False, ‘filename_prefix’: False, ‘return_results’: False, ‘anomaly_window’: 1,

}, ‘max_execution_time’: 1.0 ‘consensus’: 1, ‘algorithms_allowed_in_consensus’: [‘m66’], ‘run_3sigma_algorithms’: False, ‘run_before_3sigma’: False, ‘run_only_if_consensus’: False, ‘use_with’: [‘crucible’, ‘luminosity’], ‘debug_logging’: False,

},

The context that you wish to use the algorithm in determines whether you should set return_anomalies to True or return_results to True or and anomalies_dict is returned. The original implementation of this algorithm returned a list of anomalies if the return_anomalies was set to True, however for the inclusion as an algorithm that can be used in Vortex, it needed to be extended to be able to return a results dict.

skyline.custom_algorithms.macd module

macd.py

macd(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

Outlier detection for time-series data using Moving Average Convergence/Divergence https://en.wikipedia.org/wiki/MACD - EXPERIMENTAL

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) –

    a dictionary of any required parameters for the custom_algorithm and algorithm itself. Example: “algorithm_parameters”: {

    ”anomaly_window”: 1, “fast_window”: 12, “slow_window”: 26, “signal_window”: 9, “feature”: “macd”, # This can be macd, macd_signal or macd_histogram “return_results”: True,

    }

Returns:

anomalous, anomalyScore, results

Return type:

tuple(boolean, float, dict)

skyline.custom_algorithms.median_absolute_deviation module

median_absolute_deviation.py

median_absolute_deviation(current_skyline_app, parent_pid, timeseries, algorithm_parameters={})[source]

The last_same_hours algorithm determines the data points for the same hour and minute as the current timestamp from the last x days and calculates the mean of those values and determines whether the current data point is within 3 standard deviations of the mean.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]

  • algorithm_parameters (dict) – a dictionary of any parameters and their arguments you wish to pass to the algorithm.

Returns:

True, False or Non

Return type:

boolean

skyline.custom_algorithms.moving_sum_and_value_decrease module

skyline.custom_algorithms.moving_sum_decrease module

skyline.custom_algorithms.mstl module

mstl.py

mstl(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

EXPERIMENTAL

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) – a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the mstl custom algorithm no specific algorithm_parameters are required apart from an empty dict, example: algorithm_parameters={}. But the number_of_daily_peaks can be passed define how many peaks must exist in the window period to be classed as normal. If this is set to 3 and say that we are checking a possible anomaly at 00:05, there need to be 3 peaks that occur over the past 7 days in the dialy 23:35 to 00:05 window if there are not at least 3 then this is considered as anomalous. algorithm_parameters={'number_of_daily_peaks': 3}

Returns:

anomalous, anomalyScore

Return type:

tuple(boolean, float)

skyline.custom_algorithms.numba_spectral_residual module

skyline.custom_algorithms.one_class_svm module

one_class_svm.py

one_class_svm(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

Outlier detector for time-series data using One Class SVM

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) –

    a dictionary of any required parameters for the custom_algorithm and algorithm itself. Example: “algorithm_parameters”: {

    ”n_neighbors”: 2, “anomaly_window”: 5, “return_results”: True,

    }

Returns:

anomalous, anomalyScore, results

Return type:

tuple(boolean, float, dict)

skyline.custom_algorithms.pca module

pca.py

pca(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

Outlier detector for time-series data using PCA - EXPERIMENTAL

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) –

    a dictionary of any required parameters for the custom_algorithm and algorithm itself. Example: “algorithm_parameters”: {

    ”n_neighbors”: 2, “anomaly_window”: 5, “return_results”: True,

    }

Returns:

anomalous, anomalyScore, results

Return type:

tuple(boolean, float, dict)

skyline.custom_algorithms.sigma module

sigma.py

sigma(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

This is an implementation of the original Skyline 3sigma algorithms as a single custom algorithm. It has been extended to allow for the sigma value to be passed as a parameter.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) – a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the anomalous_daily_peak custom algorithm no specific algorithm_parameters are required apart from an empty dict, example: algorithm_parameters={}

Returns:

anomalous, anomalyScore

Return type:

tuple(boolean, float)

skyline.custom_algorithms.significant_change_window_percent_sustained module

THIS IS A SAMPLE CUSTOM ALGORITHM to provide a skeleton to develop your own custom algorithms. The algorithm itself, although viable, is not recommended for production or general use, it is simply a toy algorithm here to demonstrate the structure of a simple custom algorithm that has algorithm_parameters passed as an empty dict {}. It is documented via comments #

significant_change_window_percent_sustained(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

A data point is anomalous if it is x percent different from the median of the window (seconds resample) of the last p period (seconds). A few examples,

If the value is 10% different from the median value of the 10min windows of the last hour. algorithm_parameters: {‘window’: 600, ‘percent’: 10.0, ‘period’: 3600}

If the value is 50% different from the median value of the 10min windows of the last day. algorithm_parameters: {‘window’: 600, ‘percent’: 50.0, ‘period’: 86400}

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]

  • algorithm_parameters (dict) – a dictionary of any parameters and their arguments you wish to pass to the algorithm.

Returns:

True, False or Non

Return type:

boolean

skyline.custom_algorithms.skyline_fbprophet module

skyline.custom_algorithms.skyline_matrixprofile module

THIS IS A MORE FEATUREFUL CUSTOM ALGORITHM to provide a skeleton to develop your own custom algorithms. The algorithm itself, although viable, is not recommended for production or general use, it is simply a toy algorithm here to demonstrate the structure of a more complex custom algorithm that has algorithm_parameters passed and can also log if enabled. It is documented via comments #

skyline_matrixprofile(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

The skyline_matrixprofile algorithm uses matrixprofile to identify discords.

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]

  • algorithm_parameters (dict) –

    a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the matrixprofile custom algorithm the following parameters are required, example: ``algorithm_parameters={

    ’check_details’: {<empty_dict|check_details dict>}, ‘full_duration’: full_duration, ‘windows’: int

    }``

Returns:

True, False or Non

Return type:

boolean

skyline.custom_algorithms.skyline_ppscore module

skyline.custom_algorithms.skyline_prophet module

skyline_prophet.py

skyline_prophet(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

Outlier detector for time-series data using the spectral residual algorithm. Based on the alibi-detect implementation of “Time-Series Anomaly Detection Service at Microsoft” (Ren et al., 2019) https://arxiv.org/abs/1906.03821

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) – a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the anomalous_daily_peak custom algorithm no specific algorithm_parameters are required apart from an empty dict, example: algorithm_parameters={'return_instance_score': True}

Returns:

anomalous, anomalyScore, instance_scores

Return type:

tuple(boolean, float, instance_scores)

skyline.custom_algorithms.spectral_entropy module

spectral_entropy.py

spectral_entropy(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

Outlier detection for time-series data using Spectral Entropy. EXPERIMENTAL The outlier method

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) –

    a dictionary of any required parameters for the custom_algorithm and algorithm itself. Example: “algorithm_parameters”: {

    ”anomaly_window”: 1, “window”: 60, “frequency”: 100, “max_low_entropy”: 0.6, “return_results”: True,

    }

Returns:

anomalous, anomalyScore, results

Return type:

tuple(boolean, float, dict)

skyline.custom_algorithms.spectral_residual module

spectral_residual.py

spectral_residual(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]

Outlier detector for time-series data using the spectral residual algorithm. Based on the alibi-detect implementation of “Time-Series Anomaly Detection Service at Microsoft” (Ren et al., 2019) https://arxiv.org/abs/1906.03821 For Mirage this algorithm is FAST For Analyzer this algorithm is SLOW Although this algorithm is fast, it is not fast enough to be run in Analyzer, even if only deployed against a subset of metrics. In testing spectral_residual took between 0.134828 and 0.698201 seconds to run per metrics, which is much too long for Analyzer

Parameters:
  • current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.

  • parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.

  • timeseries (list) – the time series as a list e.g. [[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]

  • algorithm_parameters (dict) – a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the anomalous_daily_peak custom algorithm no specific algorithm_parameters are required apart from an empty dict, example: algorithm_parameters={'return_instance_score': True}

Returns:

anomalous, anomalyScore, instance_scores

Return type:

tuple(boolean, float, instance_scores)

Module contents

__init__.py

get_function_name()[source]

This is a utility function is used to determine what algorithm is reporting an algorithm error when the record_algorithm_error is used.

record_algorithm_error(current_skyline_app, parent_pid, algorithm_name, traceback_format_exc_string)[source]

This utility function is used to facilitate the traceback from any algorithm errors. The algorithm functions themselves we want to run super fast and without fail in terms of stopping the function returning and not reporting anything to the log, so the pythonic except is used to “sample” any algorithm errors to a tmp file and report once per run rather than spewing tons of errors into the log.

Note

algorithm errors tmp file clean up

the algorithm error tmp files are handled and cleaned up in Analyzer after all the spawned processes are completed.

Parameters:
  • current_skyline_app (str) – the Skyline app

  • algorithm_name (str) – the algoritm function name

  • traceback_format_exc_string (str) – the traceback_format_exc string

  • parent_pid (int) – the pid of the parent process that will be used to in error file naming

Returns:

  • True the error string was written to the algorithm_error_file

  • False the error string was not written to the algorithm_error_file

Return type:

  • boolean

run_custom_algorithm_on_timeseries(current_skyline_app, parent_pid, base_name, timeseries, custom_algorithm, custom_algorithm_dict, debug_custom_algortihms, current_func=None)[source]

Return a dictionary of custom algoritms to run on a metric determined from the settings.CUSTOM_ALGORITHMS dictionary.