skyline.crucible package
Submodules
skyline.crucible.agent module
skyline.crucible.crucible module
- class Crucible(parent_pid)[source]
Bases:
Thread
- new_load_metric_vars(metric_vars_file)[source]
Load the metric variables for a check from a metric check variables file
- Parameters:
metric_vars_file (str) – the path and filename to the metric variables files
- Returns:
the metric_vars list or
False
- Return type:
list
skyline.crucible.crucible_algorithms module
- python_version = 3
This is no man’s land. Do anything you want in here, as long as you return a boolean that determines whether the input timeseries is anomalous or not.
To add an algorithm, define it here, and add its name to settings.ALGORITHMS. It must be defined required parameters (even if your algorithm/function does not need them), as the run_algorithms function passes them to all ALGORITHMS defined in settings.ALGORITHMS.
- tail_avg(timeseries, end_timestamp, full_duration)[source]
This is a utility function used to calculate the average of the last three datapoints in the series as a measure, instead of just the last datapoint. It reduces noise, but it also reduces sensitivity and increases the delay to detection.
- median_absolute_deviation(timeseries, end_timestamp, full_duration)[source]
A timeseries is anomalous if the deviation of its latest datapoint with respect to the median is X times larger than the median of deviations.
- grubbs(timeseries, end_timestamp, full_duration)[source]
A timeseries is anomalous if the Z score is greater than the Grubb’s score.
- first_hour_average(timeseries, end_timestamp, full_duration)[source]
Calcuate the simple average over 60 datapoints (maybe one hour), FULL_DURATION seconds ago. A timeseries is anomalous if the average of the last three datapoints are outside of three standard deviations of this value.
- stddev_from_average(timeseries, end_timestamp, full_duration)[source]
A timeseries is anomalous if the absolute value of the average of the latest three datapoint minus the moving average is greater than one standard deviation of the average. This does not exponentially weight the MA and so is better for detecting anomalies with respect to the entire series.
- stddev_from_moving_average(timeseries, end_timestamp, full_duration)[source]
A timeseries is anomalous if the absolute value of the average of the latest three datapoint minus the moving average is greater than three standard deviations of the moving average. This is better for finding anomalies with respect to the short term trends.
- mean_subtraction_cumulation(timeseries, end_timestamp, full_duration)[source]
A timeseries is anomalous if the value of the next datapoint in the series is farther than three standard deviations out in cumulative terms after subtracting the mean from each data point.
- least_squares(timeseries, end_timestamp, full_duration)[source]
A timeseries is anomalous if the average of the last three datapoints on a projected least squares model is greater than three sigma.
- histogram_bins(timeseries, end_timestamp, full_duration)[source]
A timeseries is anomalous if the average of the last three datapoints falls into a histogram bin with less than 20 other datapoints (you’ll need to tweak that number depending on your data)
Returns: the size of the bin which contains the tail_avg. Smaller bin size means more anomalous.
- ks_test(timeseries, end_timestamp, full_duration)[source]
A timeseries is anomalous if 2 sample Kolmogorov-Smirnov test indicates that data distribution for last 10 datapoints (might be 10 minutes) is different from the last 60 datapoints (might be an hour). It produces false positives on non-stationary series so Augmented Dickey-Fuller test applied to check for stationarity.
- detect_drop_off_cliff(timeseries, end_timestamp, full_duration)[source]
A timeseries is anomalous if the average of the last ten datapoints is <trigger> times greater than the last data point. This algorithm is most suited to timeseries with most datapoints being > 100 (e.g high rate). The arbitrary <trigger> values become more noisy with lower value datapoints, but it still matches drops off cliffs.