skyline.custom_algorithms package
Submodules
skyline.custom_algorithms.abs_stddev_from_median module
THIS IS A SAMPLE CUSTOM ALGORITHM to provide a skeleton to develop your own
custom algorithms. The algorithm itself, although viable, is not recommended for
production or general use, it is simply a toy algorithm here to demonstrate the
structure of a simple custom algorithm that has algorithm_parameters passed
as an empty dict {}.
It is documented via comments #
- abs_stddev_from_median(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
A data point is anomalous if its absolute value is greater than three standard deviations of the median.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]algorithm_parameters (dict) – a dictionary of any parameters and their arguments you wish to pass to the algorithm.
- Returns:
True, False or Non
- Return type:
boolean
skyline.custom_algorithms.adtk_level_shift module
- adtk_level_shift(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
A timeseries is anomalous if a level shift occurs in a 5 window period bound by a factor of 9 of the normal range based on historical interquartile range.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the matrixprofile custom algorithm the following parameters are required, example: Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
True, False or Non
- Return type:
boolean
Performance is of paramount importance in Skyline, especially in terms of computational complexity, along with execution time and CPU usage. The adtk LevelShiftAD algortihm is not O(n) and it is not fast either, not when compared to the normal three-sigma triggered algorithms. However it is useful if you care about detecting all level shifts. The normal three-sigma triggered algorithms do not always detect a level shift, especially if the level shift does not breach the three-sigma limits. Therefore you may find over time that you encounter alerts that contain level shifts that you thought should have been detected. On these types of metrics and events, the adtk LevelShiftAD algortihm can be implemented to detect and alert on these. It is not recommended to run on all your metrics as it would immediately triple the analyzer runtime every if only run every 5 windows/ minutes.
Due to the computational complexity and long run time of the adtk LevelShiftAD algorithm on the size of timeseries data used by Skyline, if you consider the following timings of all three-sigma triggered algorithms and compare them to the to the adtk_level_shift results in the last 2 rows of the below log, it is clear that the running adtk_level_shift on all metrics is probably not desirable, even if it is possible to do, it is very noisy.
UPDATE: 20241026 - under Python 3.10 the load time adtk algorithms alone is between 3 and 21.099188 seconds in lumnosity, depending how busy the box is!
2021-03-06 10:46:38 :: 1582754 :: algorithm run count - histogram_bins run 567 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - histogram_bins has 567 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - histogram_bins - total: 1.051136 - median: 0.001430 2021-03-06 10:46:38 :: 1582754 :: algorithm run count - first_hour_average run 567 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - first_hour_average has 567 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - first_hour_average - total: 1.322432 - median: 0.001835 2021-03-06 10:46:38 :: 1582754 :: algorithm run count - stddev_from_average run 567 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - stddev_from_average has 567 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - stddev_from_average - total: 1.097290 - median: 0.001641 2021-03-06 10:46:38 :: 1582754 :: algorithm run count - grubbs run 567 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - grubbs has 567 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - grubbs - total: 1.742929 - median: 0.002438 2021-03-06 10:46:38 :: 1582754 :: algorithm run count - ks_test run 147 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - ks_test has 147 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - ks_test - total: 0.127648 - median: 0.000529 2021-03-06 10:46:38 :: 1582754 :: algorithm run count - mean_subtraction_cumulation run 40 times 2021-03-06 10:46:38 :: 1582754 :: algorithm timings count - mean_subtraction_cumulation has 40 timings 2021-03-06 10:46:38 :: 1582754 :: algorithm timing - mean_subtraction_cumulation - total: 0.152515 - median: 0.003152 2021-03-06 10:46:39 :: 1582754 :: algorithm run count - median_absolute_deviation run 35 times 2021-03-06 10:46:39 :: 1582754 :: algorithm timings count - median_absolute_deviation has 35 timings 2021-03-06 10:46:39 :: 1582754 :: algorithm timing - median_absolute_deviation - total: 0.143770 - median: 0.003248 2021-03-06 10:46:39 :: 1582754 :: algorithm run count - stddev_from_moving_average run 30 times 2021-03-06 10:46:39 :: 1582754 :: algorithm timings count - stddev_from_moving_average has 30 timings 2021-03-06 10:46:39 :: 1582754 :: algorithm timing - stddev_from_moving_average - total: 0.125173 - median: 0.003092 2021-03-06 10:46:39 :: 1582754 :: algorithm run count - least_squares run 16 times 2021-03-06 10:46:39 :: 1582754 :: algorithm timings count - least_squares has 16 timings 2021-03-06 10:46:39 :: 1582754 :: algorithm timing - least_squares - total: 0.089108 - median: 0.005538 2021-03-06 10:46:39 :: 1582754 :: algorithm run count - abs_stddev_from_median run 1 times 2021-03-06 10:46:39 :: 1582754 :: algorithm timings count - abs_stddev_from_median has 1 timings 2021-03-06 10:46:39 :: 1582754 :: algorithm timing - abs_stddev_from_median - total: 0.036797 - median: 0.036797 2021-03-06 10:46:39 :: 1582754 :: algorithm run count - adtk_level_shift run 271 times 2021-03-06 10:46:39 :: 1582754 :: algorithm timings count - adtk_level_shift has 271 timings 2021-03-06 10:46:39 :: 1582754 :: algorithm timing - adtk_level_shift - total: 13.729565 - median: 0.035791 … … 2021-03-06 10:46:39 :: 1582754 :: seconds to run :: 27.93 # THE TOTAL ANALYZER RUNTIME
Therefore the analysis methodology implemented for the adtk_level_shift custom_algorithm is as folows:
When new metrics are added either to the configuration or by actual new
metrics coming online that match the
algorithm_parameters['namespace'], Skyline implements sharding on new metrics into time slots to prevent a thundering herd situation from developing. A newly added metrics will eventually be assigned into a time shard and be added and the last analysed timestamp will be added to theanalyzer.last.adtk_level_shiftRedis hash key to determine the next scheduled run withalgorithm_parameters['namespace']A
run_everyparameter is implemented so that the algorithm can be
configured to run on a metric once every
run_everyminutes. The default is to run it every 5 minutes using window 5 (rolling) and trigger as anomalous if the algorithm labels any of the last 5 datapoints as anomalous. This means that there could be up to a 5 minute delay on an alert on the 60 second, 168 SECOND_ORDER_RESOLUTION_HOURS metrics in the example, but ac=9.0level shift would be detected and would be alerted on (if both analyzer and mirage triggered on it). This periodic running of the algorithm is a tradeoff so that the adtk_level_shift load and runtime can be spread overrun_everyminutes.The algorithm is not run against metrics that are sparsely populated.
When the algorithm is run on sparsely populated metrics it results in lots of false positives and noise.
The Skyline CUSTOM_ALGORITHMS implementation of the adtk LevelShiftAD algorithm is configured as the example shown below. However please note that the algorithm_parameters shown in this example configuration are suitiable for metrics that have a 60 second relation and have a
settings.ALERTSMirage SECOND_ORDER_RESOLUTION_HOURS of 168 (7 days). For metrics with a different resolution/frequency may require different values appropriate for metric resolution.Example CUSTOM_ALGORITHMS configuration:
'adtk_level_shift': {
- ‘namespaces’: [
‘skyline.analyzer.run_time’, ‘skyline.analyzer.total_metrics’, ‘skyline.analyzer.exceptions’
], ‘algorithm_source’: ‘/opt/skyline/github/skyline/skyline/custom_algorithms/adtk_level_shift.py’, ‘algorithm_parameters’: {‘c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5}, ‘max_execution_time’: 0.5, ‘consensus’: 1, ‘algorithms_allowed_in_consensus’: [‘adtk_level_shift’], ‘run_3sigma_algorithms’: True, ‘run_before_3sigma’: True, ‘run_only_if_consensus’: False, ‘use_with’: [“analyzer”, “mirage”], ‘debug_logging’: False,
},
skyline.custom_algorithms.adtk_persist module
- adtk_persist(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
A timeseries is anomalous if a persist occurs in a 5 window period bound by a factor of 9 of the normal range based on historical interquartile range.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the matrixprofile custom algorithm the following parameters are required, example: ``algorithm_parameters={
’c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5
}``
- Returns:
True, False or Non
- Return type:
boolean
Performance is of paramount importance in Skyline, especially in terms of computational complexity, along with execution time and CPU usage. The adtk persistAD algortihm is not O(n) and it is not fast either, not when compared to the normal three-sigma triggered algorithms. However it is useful if you care about detecting persists. The normal three-sigma triggered algorithms do not always detect a persist, especially if the level shift does not breach the three-sigma limits. Therefore you may find over time that you encounter alerts that contain level shifts that you thought should have been detected. On these types of metrics and events, the adtk PersistAD algortihm can be implemented to detect and alert on these. It is not recommended to run on all your metrics as it would immediately triple the analyzer runtime every if only run every 5 windows/ minutes.
UPDATE: 20241026 - under Python 3.10 the load time adtk algorithms alone is between 3 and 21.099188 seconds in lumnosity, depending how busy the box is!
Due to the computational complexity and long run time of the adtk PersistAD algorithm on the size of timeseries data used by Skyline. Therefore the analysis methodology implemented for the adtk_persist custom_algorithm is as folows:
When new metrics are added either to the configuration or by actual new
metrics coming online that match the
algorithm_parameters['namespace'], Skyline implements sharding on new metrics into time slots to prevent a thundering herd situation from developing. A newly added metrics will eventually be assigned into a time shard and be added and the last analysed timestamp will be added to theanalyzer.last.adtk_persistRedis hash key to determine the next scheduled run withalgorithm_parameters['namespace']A
run_everyparameter is implemented so that the algorithm can be
configured to run on a metric once every
run_everyminutes. The default is to run it every 5 minutes using window 5 (rolling) and trigger as anomalous if the algorithm labels any of the last 5 datapoints as anomalous. This means that there could be up to a 5 minute delay on an alert on the 60 second, 168 SECOND_ORDER_RESOLUTION_HOURS metrics in the example, but ac=9.0level shift would be detected and would be alerted on (if both analyzer and mirage triggered on it). This periodic running of the algorithm is a tradeoff so that the adtk_persist load and runtime can be spread overrun_everyminutes.The algorithm is not run against metrics that are sparsely populated.
When the algorithm is run on sparsely populated metrics it results in lots of false positives and noise.
The Skyline CUSTOM_ALGORITHMS implementation of the adtk PersistAD algorithm is configured as the example shown below. However please note that the algorithm_parameters shown in this example configuration are suitiable for metrics that have a 60 second relation and have a
settings.ALERTSMirage SECOND_ORDER_RESOLUTION_HOURS of 168 (7 days). For metrics with a different resolution/frequency may require different values appropriate for metric resolution.: Example CUSTOM_ALGORITHMS configuration:
- ‘adtk_persist’: {
- ‘namespaces’: [
‘skyline.analyzer.run_time’, ‘skyline.analyzer.total_metrics’, ‘skyline.analyzer.exceptions’
], ‘algorithm_source’: ‘/opt/skyline/github/skyline/skyline/custom_algorithms/adtk_persist.py’, ‘algorithm_parameters’: {‘c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5}, ‘max_execution_time’: 0.5, ‘consensus’: 1, ‘algorithms_allowed_in_consensus’: [‘adtk_persist’], ‘run_3sigma_algorithms’: True, ‘run_before_3sigma’: True, ‘run_only_if_consensus’: False, ‘use_with’: [“analyzer”, “mirage”], ‘debug_logging’: False,
},
skyline.custom_algorithms.adtk_seasonal module
- adtk_seasonal(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
A timeseries is anomalous if a level shift occurs in a 5 window period bound by a factor of 9 of the normal range based on historical interquartile range.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the matrixprofile custom algorithm the following parameters are required, example: ``algorithm_parameters={
’c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’,
}``
- Returns:
True, False or Non
- Return type:
boolean
Performance is of paramount importance in Skyline, especially in terms of computational complexity, along with execution time and CPU usage. The adtk SeasonalAD algortihm is not O(n) and it is not fast either, not when compared to the normal three-sigma triggered algorithms. However it is useful if you care about detecting all seasonal. The normal three-sigma triggered algorithms do not always detect a seasonal, especially if the level shift does not breach the three-sigma limits. Therefore you may find over time that you encounter alerts that contain level shifts that you thought should have been detected. On these types of metrics and events, the adtk SeasonalAD algortihm can be implemented to detect and alert on these. It is not recommended to run on all your metrics as it would immediately triple the analyzer runtime every if only run every 5 windows/ minutes.
UPDATE: 20241026 - under Python 3.10 the load time adtk algorithms alone is between 3 and 21.099188 seconds in lumnosity, depending how busy the box is!
Due to the computational complexity and long run time of the adtk SeasonalAD algorithm on the size of timeseries data used by Skyline. Therefore the analysis methodology implemented for the adtk_seasonal custom_algorithm is as folows:
When new metrics are added either to the configuration or by actual new
metrics coming online that match the
algorithm_parameters['namespace'], Skyline implements sharding on new metrics into time slots to prevent a thundering herd situation from developing. A newly added metrics will eventually be assigned into a time shard and be added and the last analysed timestamp will be added to theanalyzer.last.adtk_seasonalRedis hash key to determine the next scheduled run withalgorithm_parameters['namespace']A
run_everyparameter is implemented so that the algorithm can be
configured to run on a metric once every
run_everyminutes. The default is to run it every 5 minutes using window 5 (rolling) and trigger as anomalous if the algorithm labels any of the last 5 datapoints as anomalous. This means that there could be up to a 5 minute delay on an alert on the 60 second, 168 SECOND_ORDER_RESOLUTION_HOURS metrics in the example, but ac=9.0level shift would be detected and would be alerted on (if both analyzer and mirage triggered on it). This periodic running of the algorithm is a tradeoff so that the adtk_seasonal load and runtime can be spread overrun_everyminutes.The algorithm is not run against metrics that are sparsely populated.
When the algorithm is run on sparsely populated metrics it results in lots of false positives and noise.
The Skyline CUSTOM_ALGORITHMS implementation of the adtk SeasonalAD algorithm is configured as the example shown below. However please note that the algorithm_parameters shown in this example configuration are suitiable for metrics that have a 60 second relation and have a
settings.ALERTSMirage SECOND_ORDER_RESOLUTION_HOURS of 168 (7 days). For metrics with a different resolution/frequency may require different values appropriate for metric resolution.: Example CUSTOM_ALGORITHMS configuration:
- ‘adtk_seasonal’: {
- ‘namespaces’: [
‘skyline.analyzer.run_time’, ‘skyline.analyzer.total_metrics’, ‘skyline.analyzer.exceptions’
], ‘algorithm_source’: ‘/opt/skyline/github/skyline/skyline/custom_algorithms/adtk_seasonal.py’, ‘algorithm_parameters’: {‘c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’}, ‘max_execution_time’: 0.5, ‘consensus’: 1, ‘algorithms_allowed_in_consensus’: [‘adtk_seasonal’], ‘run_3sigma_algorithms’: True, ‘run_before_3sigma’: True, ‘run_only_if_consensus’: False, ‘use_with’: [“analyzer”, “mirage”], ‘debug_logging’: False,
},
skyline.custom_algorithms.adtk_volatility_shift module
- adtk_volatility_shift(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
A timeseries is anomalous if a volatility shift occurs in a 5 window period bound by a factor of 9 of the normal range based on historical interquartile range.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the matrixprofile custom algorithm the following parameters are required, example: ``algorithm_parameters={
’c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5
}``
- Returns:
True, False or Non
- Return type:
boolean
Performance is of paramount importance in Skyline, especially in terms of computational complexity, along with execution time and CPU usage. The adtk VolatilityShiftAD algortihm is not O(n) and it is not fast either, not when compared to the normal three-sigma triggered algorithms. However it is useful if you care about detecting volatility shifts. The normal three-sigma triggered algorithms do not always detect a volatility shift, especially if the level shift does not breach the three-sigma limits. Therefore you may find over time that you encounter alerts that contain level shifts that you thought should have been detected. On these types of metrics and events, the adtk VolatilityShiftAD algortihm can be implemented to detect and alert on these. It is not recommended to run on all your metrics as it would immediately triple the analyzer runtime every if only run every 5 windows/ minutes.
UPDATE: 20241026 - under Python 3.10 the load time adtk algorithms alone is between 3 and 21.099188 seconds in lumnosity, depending how busy the box is!
Due to the computational complexity and long run time of the adtk VolatilityShiftAD algorithm on the size of timeseries data used by Skyline. Therefore the analysis methodology implemented for the adtk_volatility_shift custom_algorithm is as folows:
When new metrics are added either to the configuration or by actual new
metrics coming online that match the
algorithm_parameters['namespace'], Skyline implements sharding on new metrics into time slots to prevent a thundering herd situation from developing. A newly added metrics will eventually be assigned into a time shard and be added and the last analysed timestamp will be added to theanalyzer.last.adtk_volatility_shiftRedis hash key to determine the next scheduled run withalgorithm_parameters['namespace']A
run_everyparameter is implemented so that the algorithm can be
configured to run on a metric once every
run_everyminutes. The default is to run it every 5 minutes using window 5 (rolling) and trigger as anomalous if the algorithm labels any of the last 5 datapoints as anomalous. This means that there could be up to a 5 minute delay on an alert on the 60 second, 168 SECOND_ORDER_RESOLUTION_HOURS metrics in the example, but ac=9.0level shift would be detected and would be alerted on (if both analyzer and mirage triggered on it). This periodic running of the algorithm is a tradeoff so that the adtk_volatility_shift load and runtime can be spread overrun_everyminutes.The algorithm is not run against metrics that are sparsely populated.
When the algorithm is run on sparsely populated metrics it results in lots of false positives and noise.
The Skyline CUSTOM_ALGORITHMS implementation of the adtk VolatilityShiftAD algorithm is configured as the example shown below. However please note that the algorithm_parameters shown in this example configuration are suitiable for metrics that have a 60 second relation and have a
settings.ALERTSMirage SECOND_ORDER_RESOLUTION_HOURS of 168 (7 days). For metrics with a different resolution/frequency may require different values appropriate for metric resolution.: Example CUSTOM_ALGORITHMS configuration:
- ‘adtk_volatility_shift’: {
- ‘namespaces’: [
‘skyline.analyzer.run_time’, ‘skyline.analyzer.total_metrics’, ‘skyline.analyzer.exceptions’
], ‘algorithm_source’: ‘/opt/skyline/github/skyline/skyline/custom_algorithms/adtk_volatility_shift.py’, ‘algorithm_parameters’: {‘c’: 9.0, ‘run_every’: 5, ‘side’: ‘both’, ‘window’: 5}, ‘max_execution_time’: 0.5, ‘consensus’: 1, ‘algorithms_allowed_in_consensus’: [‘adtk_volatility_shift’], ‘run_3sigma_algorithms’: True, ‘run_before_3sigma’: True, ‘run_only_if_consensus’: False, ‘use_with’: [“analyzer”, “mirage”], ‘debug_logging’: False,
},
skyline.custom_algorithms.anomalous_daily_peak module
anomalous_daily_peak.py
- anomalous_daily_peak(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
A 7 day timeseries is NOT anomalous if the last datapoints are in a peak whose values are within 3 standard deviations of the mean of x number of similar peaks (+/- 20%) that occurred in windows that are 24 hours apart.
Only timeseries that are thought to be anomalous should be run through this algorithm. It is meant to be run in Mirage after all algorithms, including custom algorithms have been run and found a metric to be ANOMALOUS. This algorithm does a final check to see in the anomaly is repetetive daily pattern which occurs more than x times a week.
Be aware that this algorithm will identify ANY datapoint that is NOT in a repetitive peak and within 3-sigma (+/- 20%) of the other peaks as ANOMALOUS.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) – a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the anomalous_daily_peak custom algorithm no specific algorithm_parameters are required apart from an empty dict, example:
algorithm_parameters={}. But the number_of_daily_peaks can be passed define how many peaks must exist in the window period to be classed as normal. If this is set to 3 and say that we are checking a possible anomaly at 00:05, there need to be 3 peaks that occur over the past 7 days in the dialy 23:35 to 00:05 window if there are not at least 3 then this is considered as anomalous.algorithm_parameters={'number_of_daily_peaks': 3}
- Returns:
anomalous, anomalyScore
- Return type:
tuple(boolean, float)
skyline.custom_algorithms.azure_ai_anomalydetector module
skyline.custom_algorithms.dbscan module
dbscan.py
- dbscan(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
Outlier detector based on DBSCAN. EXPERIMENTAL
UNRELIABLE as it is very sensitive to input parameters which make it difficult to automatically determine suitable parameters. Automatically determined parameters can sometimes be very effective, but often they do not have the desired results. Seeing as there is a single epsilon value for all clusters the algorithm fails when varying density clusters are present in the data.
Therefore if DBSCAN identifies more than 33% of the data points in a timeseries as outliers, this algorithm will return an inconclusive results.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the lad custom algorithm no specific algorithm_parameters are required apart from an empty dict but the algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'window'(int):The number of data points to use in the sliding window to calculate Xmean and Xvar. Default is
3.
'min_samples'(int):The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself. If min_samples is set to a higher value, DBSCAN will find denser clusters, whereas if it is set to a lower value, the found clusters will be more sparse. Default is
4.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘window’: 3, ‘min_samples’: 4, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
skyline.custom_algorithms.grafana_promql_anomaly_detection module
skyline.custom_algorithms.irregular_unstable module
irregular_unstable.py
- irregular_unstable(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
NOTE: This algorithm analyses time series that are deemed to be ANOMALOUS so the default anomalous value is True.
A timeseries is NOT anomalous if it has low variance over 30 days and does not trigger multiple algorithms.
Only timeseries that are thought to be anomalous AND have a low variance at 7 days should be run through this algorithm. It is meant to be run with Mirage after all algorithms, inlcuding custom algorithms have been run and found a metric to be ANOMALOUS. This algorithm does a final check to see if the metric has low variance and if so analyses the data at 30 days.
On irregular, unstable metrics that exhibit low variance at 7 days and 30 days, the algorithm generally results in ~63% of anomalies on these timeseries at 7 days, being correctly identified as false positives when the data is analysed at 30 days.
The irregular_unstable algorithm takes on average 3.861 seconds to run, however if the timeseries is discarded because at 30 days it does not have low variance, the average discard time is 0.213 seconds.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) – a dictionary of any required parameters for the custom_algorithm and algorithms itself. The required elements in algorithm_parameters are metric or base_name. Additional parameters can be passed, please refer to the algorithm_parameters[key] that are looked up in the beginning of the code below to determine what algorithm_parameters can be passed.
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
skyline.custom_algorithms.isolation_forest module
isolation_forest.py
- isolation_forest(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
Outlier detector based on Isolation Forest.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the isolation_forest custom algorithm no specific algorithm_parameters are required apart from an empty dict but the algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'contamination'(float):'contamination'(float or “auto”):The proportion of the dataset that is expected to be anomalies or outliers. This can be either:
- A
floatvalue between0.0and0.5representing the expected proportion of anomalies in the data. For example,
0.01would mean that 1% of the data is anticipated to be anomalous.
- A
- The string
"auto": This option allows the algorithm to automatically determine an appropriate contamination level based on the characteristics of the data. Useful when the proportion of outliers is unknown.
- The string
Default is
"auto".
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘contamination’: 0.01,, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
skyline.custom_algorithms.lad module
lad.py
- lad(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
Large Deviations Anomaly Detection Univariate implementation of: Large Deviations Anomaly Detection (LAD) for collection of multivariate time series data: Applications to COVID-19 data (Sreelekha Guggilam, Varun Chandola, Abani K. Patra) Journal of Computational Science 72 (2023) 102101 https://www.sciencedirect.com/science/article/pii/S1877750323001618 https://pdf.sciencedirectassets.com/280179/1-s2.0-S1877750323X00076/1-s2.0-S1877750323001618/main.pdf https://doi.org/10.1016/j.jocs.2023.102101 At 95 percentile may be the noisiest algorithm yet… If threshold is set at 95, will detect step changes, etc. If threshold is set at 99 will only detect most severe spike/dip/point anomalies. Fast but too noisy.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the lad custom algorithm no specific algorithm_parameters are required apart from an empty dict but the algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'threshold'(int):The percentile value for the threshold. Default is
95.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘threshold’: 99, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(boolean, float, dict)
skyline.custom_algorithms.last_same_hours module
THIS IS A MORE FEATUREFUL CUSTOM ALGORITHM to provide a skeleton to develop your
own custom algorithms. The algorithm itself, although viable, is not
recommended for production or general use, it is simply a toy algorithm here to
demonstrate the structure of a more complex custom algorithm that has
algorithm_parameters passed and can also log if enabled.
It is documented via comments #
- last_same_hours(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
The last_same_hours algorithm determines the data points for the same hour and minute as the current timestamp from the last x days and calculates the mean of those values and determines whether the current data point is within 3 standard deviations of the mean.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]algorithm_parameters (dict) – a dictionary of any parameters and their arguments you wish to pass to the algorithm.
- Returns:
True, False or Non
- Return type:
boolean
skyline.custom_algorithms.lof module
lof.py
- lof(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
Local Outlier Factor
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the lof custom algorithm no specific algorithm_parameters are required apart from an empty dict but the algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'n_neighbors'(int): The number of n_neighbours.Default is
2.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 5, ‘n_neighbors’: 2, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(boolean, float, dict)
skyline.custom_algorithms.low_variance_anomalous_peak_trough module
low_variance_anomalous_peak_trough.py
- low_variance_anomalous_peak_trough(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
A time series with low variance or few peaks/troughs is anomalous if the data point is > 3sigma of the peaks/troughs: peak_values_mean + (3 * peak_values_stdDev). trough_values_mean - (3 * trough_values_stdDev). This algorithm is ONLY suited to assessing the last datapoints and is NOT suited to an anomaly_window > 10
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1690920000, 0.0], ..., [1691524200.0, 0.0], [1691524800.0, 0.5]]algorithm_parameters (dict) –
- {
‘currently_anomalous’: False, # whether instance state is anomalous or not, default False ‘anomaly_window’: 4, # should be > 1 <= 10, default 4 ‘return_results’: False, # whether to return the result dict, default False ‘debug_logging’: False, # whether to log, default False
}
algorithm_parameters –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the low_variance_anomalous_peak_trough custom algorithm no specific algorithm_parameters are required apart from an empty dict but the algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'currently_anomalous'(bool): Optional.Whether the instance state is anomalous or not. Default is
False.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 4, ‘currently_anomalous’: True, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(boolean, float, dict)
skyline.custom_algorithms.m66 module
- m66(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
A time series data points are anomalous if the 6th median is 6 standard deviations (six-sigma) from the time series 6th median standard deviation and persists for x_windows, where x_windows = int(window / 2). This algorithm finds SIGNIFICANT changepoints in a time series, similar to PELT and Bayesian Online Changepoint Detection, however it is more robust to instaneous outliers and more conditionally selective of changepoints.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the m66 custom algorithm no specific algorithm_parameters are required apart from an empty dict but the algorithm_parameters that can be passed are as follows, however there are more internal performance related and other parameters that can be passed (see in code not in docstrings):
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'nth_median'(int): The number of medians to use.Default is
6.
'sigma'(int): The sigma value to use.Default is
6.
'window'(int): The number of data points in a window.Default is
5.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘nth_median’: 6, ‘sigma’: 6, ‘window’: 5, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
Example CUSTOM_ALGORITHMS configuration:
- ‘m66’: {
- ‘namespaces’: [
‘skyline.analyzer.run_time’, ‘skyline.analyzer.total_metrics’, ‘skyline.analyzer.exceptions’
], ‘algorithm_source’: ‘/opt/skyline/github/skyline/skyline/custom_algorithms/m66.py’, ‘algorithm_parameters’: {
‘nth_median’: 6, ‘sigma’: 6, ‘window’: 5, ‘resolution’: 60, ‘minimum_sparsity’: 0, ‘determine_duration’: False, ‘return_anomalies’: True, ‘save_plots_to’: False, ‘save_plots_to_absolute_dir’: False, ‘filename_prefix’: False, ‘return_results’: False, ‘anomaly_window’: 1,
}, ‘max_execution_time’: 1.0 ‘consensus’: 1, ‘algorithms_allowed_in_consensus’: [‘m66’], ‘run_3sigma_algorithms’: False, ‘run_before_3sigma’: False, ‘run_only_if_consensus’: False, ‘use_with’: [‘crucible’, ‘luminosity’], ‘debug_logging’: False,
},
The context that you wish to use the algorithm in determines whether you should set return_anomalies to True or return_results to True or and anomalies_dict is returned. The original implementation of this algorithm returned a list of anomalies if the return_anomalies was set to True, however for the inclusion as an algorithm that can be used in Vortex, it needed to be extended to be able to return a results dict.
skyline.custom_algorithms.macd module
macd.py
- macd(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
Outlier detection for time-series data using Moving Average Convergence/Divergence https://en.wikipedia.org/wiki/MACD - EXPERIMENTAL
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the macd custom algorithm no specific algorithm_parameters are required apart from an empty dict but the algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'fast_window'(int): The size of the fast window.Default is
12.
'slow_window'(int): The size of the slow window.Default is
26.
'signal_window'(int): The size of the signal window.Default is
9.
'feature'(str): The macd feature to use.Default is
macd. Possible values: macd | macd_signal | macd_histogram.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘fast_window’: 12, ‘slow_window’: 26, ‘signal_window’: 9, ‘feature’: ‘macd’, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(boolean, float, dict)
skyline.custom_algorithms.median_absolute_deviation module
median_absolute_deviation.py
- median_absolute_deviation(current_skyline_app, parent_pid, timeseries, algorithm_parameters={})[source]
A timeseries is anomalous if the deviation of its latest datapoint with respect to the median is X times larger than the median of deviations.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]algorithm_parameters (dict) – a dictionary of any parameters and their arguments you wish to pass to the algorithm.
- Returns:
True, False or Non
- Return type:
boolean
skyline.custom_algorithms.mstl module
mstl.py
- mstl(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
EXPERIMENTAL
A basic implementation of statsforecast MSTL - https://github.com/Nixtla/statsforecast
https://nixtlaverse.nixtla.io/statsforecast/docs/models/multipleseasonaltrend.html
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the mstl custom algorithm no specific algorithm_parameters are required apart from an empty dict but the algorithm_parameters that can be passed are:
'base_name'(str): The metric base_name.'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'level'(int): This is the confidence percentile.This optional parameter is used for probabilistic forecasting. Set the level (or confidence percentile) of your prediction interval. For example, level=95 means that the model expects the real value to be inside that interval 95% of the times. Default is
99.
'season_hours'(int): The first order seasonality.The number of hours which represents the first order seasonality of the data. Default is
24.
'season_days'(int): The second order seasonality.The number of days which represent a cycle of the first order seasonality of in the data. Default is
7.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘level’: 99, ‘season_hours’: 24, ‘season_days’: 7, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
skyline.custom_algorithms.one_class_svm module
one_class_svm.py
- one_class_svm(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
Outlier detector for time-series data using One Class SVM base on the moving mean and variance, unless the variance is low in which case the standard deviation will be used in place of variance. The algorithm parameters to be concerned with are
'window'which defines the length of sliding window to use,nuwhich defines the percentage that can be considered as outliers e.g. 0.1 would be 10%. Do note that if the variance is low each spike or trough will probably be identified as an outlier.- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the one_class_svm custom algorithm no specific algorithm_parameters are required apart from an empty dict but the algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'window'(int): The sliding window size.Default is
3.
'nu'(float): The threshold value.The value for nu which defines the percentage that can be considered as outliers e.g. 0.1 would be 10%. Default is
0.01.
'gamma'(str): Kernel coefficient forrbf,polyandsigmoid.Default is
scale. Possible values: scale | auto.scale - uses 1 / (n_features * X.var()) as value of gamma.
auto - uses 1 / n_features as value of gamma.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘window’: 3, ‘nu’: 0.01, ‘gamma’: ‘scale’, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(boolean, float, dict)
skyline.custom_algorithms.pca module
pca.py
- pca(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
Outlier detector for time-series data using PCA - EXPERIMENTAL
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the pca custom algorithm no specific algorithm_parameters are required apart from an empty dict but the algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'threshold'(float): The threshold value.The value for the threshold to use, >= to this value is anomalous. Default is
0.7.
'n_test'(int): Size of test sample.The number of samples in the test data. Default is
10.
'diffs'(int): Number of differences to calculate.The number of differences to calculate. Default is
1.
'lags'(int): Number of lags to calculate.The number of lags to calculate. Default is
3.
'smooth'(int): Number of data points used in rolling average for smoothing.Default is
3.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘threshold’: 0.7, ‘n_test’: 10, ‘diffs’: 1, ‘lags’: 3, ‘smooth’: 3, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(boolean, float, dict)
skyline.custom_algorithms.probabilistic_forecasts_generalized_pareto_distribution_ets module
probabilistic_forecasts_generalized_pareto_distribution_ets.py
- probabilistic_forecasts_generalized_pareto_distribution_ets(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
A basic Python implementation of Probabilistic forecasts for anomaly detection as proposed by Rob J Hyndman - 3 July 2024
https://robjhyndman.com/seminars/isf2024.html International Symposium on Forecasting, Dijon, France
https://raw.githubusercontent.com/robjhyndman/forecast-anomalies-talk/main/forecast_anomalies.pdf
When a forecast is very inaccurate, it is sometimes because a poor forecasting model is used, but it can also occur when an unusual observation occurs. A good forecasting model can be used to identify anomalies. The approach taken is to use a probabilistic forecast, and to compute the “density scores” equal to the negative log likelihood of the observations based on the forecast distributions. The density scores provide a measure of how anomalous each observation is, given the forecast density. A large density score indicates that the observation is unlikely, and so is a potential anomaly. On the other hand, typical values will have low density scores. A Generalized Pareto Distribution is fitted to the largest density scores to estimate the probability of each observation being an anomaly.
This implementation uses statespace ExponentialSmoothing. Although holtwinters ExponentialSmoothing could also be used, a seasonal parameter needs to be defined therefore holtwinters ExponentialSmoothing is not suited to zero knowledge, unsupervised analysis.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the probabilistic_forecasts_generalized_pareto_distribution_ets custom algorithm no specific algorithm_parameters are required apart from an empty dict but the algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'threshold'(int): The percentile value for the threshold.The percentile value for the threshold to use. Default is
95.
'p_value'(int): The p_value.The p_value to use. Default is
95.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘threshold’: 95, ‘p_value’: 95, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
skyline.custom_algorithms.sigma module
sigma.py
- sigma(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
This is an implementation of the original Skyline 3sigma algorithms as a single custom algorithm. It has been extended to allow for the sigma value to be passed as a parameter.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the sigma custom algorithm no specific algorithm_parameters are required apart from an empty dict but the sigma algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'sigma_value'(int): The sigma value.The sigma value to use. Default is
3.
'consensus'(int): The consensus count.The consensus count to use, e.g. how many algorithms need to trigger to be considered anomalous. Default is
6.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘sigma’: 3, ‘consensus’: 6, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
skyline.custom_algorithms.significant_change_window_percent_sustained module
THIS IS A SAMPLE CUSTOM ALGORITHM to provide a skeleton to develop your own
custom algorithms. The algorithm itself, although viable, is not recommended for
production or general use, it is simply a toy algorithm here to demonstrate the
structure of a simple custom algorithm that has algorithm_parameters passed
as an empty dict {}.
It is documented via comments #
- significant_change_window_percent_sustained(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
A data point is anomalous if it is x percent different from the median of the window (seconds resample) of the last p period (seconds). A few examples,
If the value is 10% different from the median value of the 10min windows of the last hour. algorithm_parameters: {‘window’: 600, ‘percent’: 10.0, ‘period’: 3600}
If the value is 50% different from the median value of the 10min windows of the last day. algorithm_parameters: {‘window’: 600, ‘percent’: 50.0, ‘period’: 86400}
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]algorithm_parameters (dict) – a dictionary of any parameters and their arguments you wish to pass to the algorithm.
- Returns:
True, False or Non
- Return type:
boolean
skyline.custom_algorithms.single_value_anomaly module
single_value_anomaly.py
- single_value_anomaly(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
A time series in anomalous if all the values are equal apart from the last data point.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) – only an empty dictionary is required.
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
skyline.custom_algorithms.skyline_matrixprofile module
THIS IS A MORE FEATUREFUL CUSTOM ALGORITHM to provide a skeleton to develop your
own custom algorithms. This algorithm demonstrates the structure of a more
complex custom algorithm that has algorithm_parameters passed and can also
log if enabled.
It is documented via comments #
- skyline_matrixprofile(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
The skyline_matrixprofile algorithm uses matrixprofile to identify discords.
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. The following algorithm_parameters can be passed to skyline_matrixprofile:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1but with matrix profile3generally works a bit better.
'windows'(int): The window size to use for the matrix profile.There is no default a value must be passed.
'k_discords'(int): The number of discords to detect in the matrixprofile. Default is
20.
'use_fft_extrapolation'(bool):If
True, enables the use of FFT (Fast Fourier Transform) extrapolation to pad the last window. Default isTrue.
'context'(str or None): OptionalThis is for Skyline internal routing in terms of snab. Default is
None.
'check_details'(dict or {}): OptionalThe check_details dict from the Skyline app, for internal Skyline use only. Default is an empty dictionary
{}.
'tornado_url'(str or None): OptionalThe URL for tornado if it is being used. Default is
None.
'tornado_api_key'(str or None): OptionalThe API key for authenticating on /flux/tornado, this will either be the
settings.FLUX_SELF_API_KEYor a key fromsettings.FLUX_API_KEYS. Required only if tornado is being used. Default isNone.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 3, ‘windows’: 5, ‘k_discords’: 20, ‘use_fft_extrapolation’: True, ‘tornado_url’: ‘http://127.0.0.1:8000/tornado?algorithm=stump’, ‘tornado_api_key’: ‘<FLUX_SELF_API_KEY> or a key from FLUX_API_KEYS’, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
skyline.custom_algorithms.skyline_prophet module
skyline_prophet.py
- skyline_prophet(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
Prophet
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the prophet custom algorithm no specific algorithm_parameters are required apart from an empty dict but the prophet algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'interval_width'(float): The width of the uncertainty interval.The width of the uncertainty interval to produce in the forecast on, representing the probability that the actual value will fall within the interval. For example,
interval_width=0.99produces a 99% confidence interval. Default is0.99.
'changepoint_range'(float): The default changepoints.By default changepoints are only inferred for the first 80% of the time series in order to have plenty of runway for projecting the trend forward and to avoid overfitting fluctuations at the end of the time series. This default works in many situations but not all, and can be changed using the changepoint_range argument
changepoint_range=0.9limits changepoints to the first 90% of the data. Default is0.8.
'daily_seasonality'(bool): Whether to enable daily seasonality.Hourly fluctuations in daily data. Default is
False.
'weekly_seasonality'(bool): Whether to enable weekly seasonality.Daily fluctuations in weekly data). Default is
False.
'yearly_seasonality'(bool): Whether to enable yearly seasonality.Seasonal patterns in yearly data. Default is
False.
'seasonality_mode'(str): The type of seasonal to applyCan be either
'multiplicative'or'additive'. Use'multiplicative'if seasonal effects are proportional to the level of the time series, or'additive'if the effects are independent of the level. Default is'multiplicative'.
'return_results'(bool): Optional.If
Truereturns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘interval_width’: 0.99, ‘changepoint_range’: 0.8, ‘daily_seasonality’: False, ‘weekly_seasonality’: False, ‘yearly_seasonality’: False, ‘seasonality_mode’: ‘multiplicative’, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
skyline.custom_algorithms.spectral_entropy module
spectral_entropy.py
- spectral_entropy(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
Outlier detection for time-series data using Spectral Entropy. EXPERIMENTAL The outlier method
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the spectral_entropy custom algorithm no specific algorithm_parameters are required apart from an empty dict but the spectral_entropy algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is1.
'frequency'(int): The sampling frequency of the x time series.Default is
100.
'window'(int): The rolling window.How many data points in the rolling window. Default is
60.
'max_low_entropy'(float): The maximum low entropy value.What is the maximum value that can be considered as low entropy? If the spectral_entropy value is not close to 0 then it is not anomalous, but the range which could be considered low varies somewhat. Through experimentation and testing a general value of 0.6 has been determined to represent not anomalous behaviour in most cases. In fact it does very well at limiting the number of anomalies that can be reported. Default is
0.6.
'determine_frequency'(bool): Optional.If
Truethe frequency is determined automatically from the time series data. Default isFalse.
'return_results'(bool): Optional.If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool): Optional.If
True, enables debug logging.
'debug_print'(bool): Optional.If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 1, ‘window’: 60, ‘determine_frequency’: True, ‘max_low_entropy’: 0.6, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
skyline.custom_algorithms.spectral_residual module
spectral_residual.py
- spectral_residual(current_skyline_app, parent_pid, timeseries, algorithm_parameters)[source]
Outlier detector for time-series data using the spectral residual algorithm. Based on the alibi-detect implementation of:
Time-Series Anomaly Detection Service at Microsoft (Ren et al., 2019) https://arxiv.org/abs/1906.03821
For Mirage this algorithm is FAST
For Analyzer this algorithm is SLOW
Although this algorithm is fast, it is not fast enough to be run in Analyzer, even if only deployed against a subset of metrics. In testing spectral_residual took between 0.134828 and 0.698201 seconds to run per metrics, which is much too long for Analyzer
- Parameters:
current_skyline_app (str) – the Skyline app executing the algorithm. This will be passed to the algorithm by Skyline. This is required for error handling and logging. You do not have to worry about handling the argument in the scope of the custom algorithm itself, but the algorithm must accept it as the first agrument.
parent_pid (int) – the parent pid which is executing the algorithm, this is required for error handling and logging. You do not have to worry about handling this argument in the scope of algorithm, but the algorithm must accept it as the second argument.
timeseries (list) – the time series as a list e.g.
[[1667608854, 1269121024.0], [1667609454, 1269174272.0], [1667610054, 1269174272.0]]algorithm_parameters (dict) –
a dictionary of any required parameters for the custom_algorithm and algorithm itself. For the spectral_residual custom algorithm no specific algorithm_parameters are required apart from an empty dict but the spectral_residual algorithm_parameters that can be passed are:
'anomaly_window'(int): The anomaly_window value.This specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is3even in in the real time context because spectral_residual often triggers on the far leading side rather than the trailing side of a peak.
'anomaly_window'(int): The anomaly_window valueThis specifies how many of the last data points should be considered when determining if the metric is anomalous. Only the last
anomaly_windowdata points in the time series will be used to determine if the metric is anomalous. Default is3.
'threshold'(float): Threshold used to classify outliers.Relative saliency map distance from the moving average. Default is
Nonebecause Skyline is using spectral_residual in an unsupervised manner and makes use of the spectral_residual infer_threshold function method to dynamically calculate the outlier threshold from the data.
'threshold_perc'(float):A threshold a value inferred from the percentage of instances considered to be outliers in a sample of the dataset. Default is
99.0.
'window_amp'(int): Window for the average log amplitude.Default is
20.
'window_local'(int):Window for the local average of the saliency map. Note that the averaging is performed over the previous window_local data points (i.e., is a local average of the preceding window_local points for the current index). Default is
20.
'n_est_points'(int):Number of estimated points padded to the end of the sequence. Default is
20.
'n_grad_points'(int):Number of points used for the gradient estimation of the additional points padded to the end of the sequence. Default is
20.
'padding_amp_method'(str):Padding method to be used prior to each convolution over log amplitude. Possible values: constant | replicate | reflect.
constant - padding with constant 0.
replicate - repeats the last/extreme value.
reflect - reflects the time series.
Default value: reflect.
'padding_local_method'(str):Padding method to be used prior to each convolution over saliency map. Possible values: constant | replicate | reflect.
constant - padding with constant 0.
replicate - repeats the last/extreme value.
reflect - reflects the time series.
Default value: reflect.
'padding_amp_side'(str):Whether to pad the amplitudes on both sides or only on one side. Possible values: bilateral | left | right. Default value: bilateral.
'return_results'(bool):If
True, returns the results dict in addition to anomalous and anomalyScore. Default isFalse.
'debug_logging'(bool):If
True, enables debug logging.
'debug_print'(bool):If
True, enables debug printing (for Jupyter testing). Default isFalse.
Example usage:
- algorithm_parameters={
‘anomaly_window’: 3, ‘threshold’: None, ‘threshold_perc’: 99.0, ‘debug_logging’: True, ‘return_results’: True,
}
- Returns:
anomalous, anomalyScore, results
- Return type:
tuple(bool, float, dict)
Module contents
__init__.py
- get_function_name()[source]
This is a utility function is used to determine what algorithm is reporting an algorithm error when the record_algorithm_error is used.
- record_algorithm_error(current_skyline_app, parent_pid, algorithm_name, traceback_format_exc_string)[source]
This utility function is used to facilitate the traceback from any algorithm errors. The algorithm functions themselves we want to run super fast and without fail in terms of stopping the function returning and not reporting anything to the log, so the pythonic except is used to “sample” any algorithm errors to a tmp file and report once per run rather than spewing tons of errors into the log.
Note
- algorithm errors tmp file clean up
the algorithm error tmp files are handled and cleaned up in
Analyzerafter all the spawned processes are completed.
- Parameters:
current_skyline_app (str) – the Skyline app
algorithm_name (str) – the algoritm function name
traceback_format_exc_string (str) – the traceback_format_exc string
parent_pid (int) – the pid of the parent process that will be used to in error file naming
- Returns:
Truethe error string was written to the algorithm_error_fileFalsethe error string was not written to the algorithm_error_file
- Return type:
boolean
- run_custom_algorithm_on_timeseries(current_skyline_app, parent_pid, base_name, timeseries, custom_algorithm, custom_algorithm_dict, debug_custom_algortihms, current_func=None)[source]
Return a dictionary of custom algoritms to run on a metric determined from the
settings.CUSTOM_ALGORITHMSdictionary.