skyline.crucible package

Submodules

skyline.crucible.agent module

skyline.crucible.crucible module

class Crucible(parent_pid)[source]

Bases: Thread

check_if_parent_is_alive()[source]

Check if the parent process is alive

new_load_metric_vars(metric_vars_file)[source]

Load the metric variables for a check from a metric check variables file

Parameters:

metric_vars_file (str) – the path and filename to the metric variables files

Returns:

the metric_vars list or False

Return type:

list

spin_process(i, run_timestamp, metric_check_file)[source]

Assign a metric for a process to analyze.

Parameters:
  • i – python process id

  • run_timestamp – the epoch timestamp at which this process was called

  • metric_check_file – full path to the metric check file

Returns:

returns True

run()[source]

Called when the process intializes.

skyline.crucible.crucible_algorithms module

python_version = 3

This is no man’s land. Do anything you want in here, as long as you return a boolean that determines whether the input timeseries is anomalous or not.

To add an algorithm, define it here, and add its name to settings.ALGORITHMS. It must be defined required parameters (even if your algorithm/function does not need them), as the run_algorithms function passes them to all ALGORITHMS defined in settings.ALGORITHMS.

tail_avg(timeseries, end_timestamp, full_duration)[source]

This is a utility function used to calculate the average of the last three datapoints in the series as a measure, instead of just the last datapoint. It reduces noise, but it also reduces sensitivity and increases the delay to detection.

median_absolute_deviation(timeseries, end_timestamp, full_duration)[source]

A timeseries is anomalous if the deviation of its latest datapoint with respect to the median is X times larger than the median of deviations.

grubbs(timeseries, end_timestamp, full_duration)[source]

A timeseries is anomalous if the Z score is greater than the Grubb’s score.

first_hour_average(timeseries, end_timestamp, full_duration)[source]

Calcuate the simple average over 60 datapoints (maybe one hour), FULL_DURATION seconds ago. A timeseries is anomalous if the average of the last three datapoints are outside of three standard deviations of this value.

stddev_from_average(timeseries, end_timestamp, full_duration)[source]

A timeseries is anomalous if the absolute value of the average of the latest three datapoint minus the moving average is greater than one standard deviation of the average. This does not exponentially weight the MA and so is better for detecting anomalies with respect to the entire series.

stddev_from_moving_average(timeseries, end_timestamp, full_duration)[source]

A timeseries is anomalous if the absolute value of the average of the latest three datapoint minus the moving average is greater than three standard deviations of the moving average. This is better for finding anomalies with respect to the short term trends.

mean_subtraction_cumulation(timeseries, end_timestamp, full_duration)[source]

A timeseries is anomalous if the value of the next datapoint in the series is farther than three standard deviations out in cumulative terms after subtracting the mean from each data point.

least_squares(timeseries, end_timestamp, full_duration)[source]

A timeseries is anomalous if the average of the last three datapoints on a projected least squares model is greater than three sigma.

histogram_bins(timeseries, end_timestamp, full_duration)[source]

A timeseries is anomalous if the average of the last three datapoints falls into a histogram bin with less than 20 other datapoints (you’ll need to tweak that number depending on your data)

Returns: the size of the bin which contains the tail_avg. Smaller bin size means more anomalous.

ks_test(timeseries, end_timestamp, full_duration)[source]

A timeseries is anomalous if 2 sample Kolmogorov-Smirnov test indicates that data distribution for last 10 datapoints (might be 10 minutes) is different from the last 60 datapoints (might be an hour). It produces false positives on non-stationary series so Augmented Dickey-Fuller test applied to check for stationarity.

detect_drop_off_cliff(timeseries, end_timestamp, full_duration)[source]

A timeseries is anomalous if the average of the last ten datapoints is <trigger> times greater than the last data point. This algorithm is most suited to timeseries with most datapoints being > 100 (e.g high rate). The arbitrary <trigger> values become more noisy with lower value datapoints, but it still matches drops off cliffs.

run_algorithms(timeseries, timeseries_name, end_timestamp, full_duration, timeseries_file, skyline_app, algorithms, alert_interval, add_to_panorama, padded_timeseries, from_timestamp)[source]

Iteratively run algorithms.

Module contents