Requires - Panorama to be enabled and running. It is enabled and configured
LUMINOSITY_ variables in
Luminosity handles correlation and classification of anomalies and metrics. Luminosity also learns what metrics are related to each other using all the previous discovered correlations data for metrics.
Luminosity takes the time series of an anomalous metric and then cross
correlates the time series against all the other metrics (or correlation mapped
namespaces metrics, see
settings.LUMINOSITY_CORRELATION_MAPS) time series to find and record
metrics that are strongly correlated to the anomalous metric, using Linkedin’s
This information can help in root cause analysis and gives the operator a view of other metrics, perhaps on different servers or in different namespaces that could be related to anomalous event.
It handles metrics having a time lagged effect on other metrics this is handled
with time shifted cross correlations too. So that if
something 120 seconds ago and
metric.b become anomalous 120 seconds later,
if there is correlation between the time shifted time series, these will be
found as well and recorded with time shifted value e.g. -120 and a
Cross correlations are only currently viewable in Ionosphere on training_data
and features profile pages, however you can query MySQL directly to report on
settings.ALERTS anomalies ad hoc until a full Luminosity webapp page
These are the correlations from the entire metric population (or or correlation
mapped namespaces metrics) that fall within
settings.LUMINOL_CROSS_CORRELATION_THRESHOLD, they are not necessarily
all contextually related, but the contextually related correlations are listed
for the anomaly. These are simply the numeric correlations it is currently
up to the operator to review them. As it is assumed that the operator may know
which metrics are likely to correlate with this anomaly on the specific metric,
until such a time as Skyline can do so on its own (something on the roadmap).
At the current stage of development Luminosity adds lots of noise along with the signals in the correlations, somewhat similar to the original Analyzer. This is just the start, to be able to try and make it better and useful, the data is needed first.
Luminosity can also be enabled to classify metrics and timeseries. There are a number of classifications the luminosity/classify_metrics can be enabled to do. These different types of classifications are required to enable certain functionalities.
settings.LUMINOSITY_CLASSIFY_METRICS_LEVEL_SHIFT results in
identifying metrics that can experience significant level shifts using the adtk
LevelShiftAD and PersistAD algorithms. When this classification is enabled
luminosity/classify_metrics iterates through all the metrics (once every 4 hours)
and runs LevelShiftAD and PersistAD algorithms against the previous 7 days of
data for the metric and identifies any significant level shifts. Both algorithms
are configured with
c=9.9 to ensure as few false positives as possible.
PersistAD is evaluated if LevelShiftAD triggers and a significant
level shift will only be identified if both LevelShiftAD anomalies and
PersistAD anomalies align, this identifies a true persisted level shift was
experienced. When first enabled luminosity/classify_metrics checks each 7 day
window in the last 6 months of data for each metric, to identify if any historic
level shifts were experienced. Thereafter each metric will only have the last 7
day window checked per 4 hours. Once a metric has been identified as level
shift metric, the level shift classification will not been run again. This is
also true if the metric is in a known or configured namespace.
The purpose of identifying metrics that can experience genuine level shifts, is to feed them back into custom_algorithms/adtk_level_shift for level shift analysis at runtime. There are certain types of metrics that could level shift without the bounds of the three-sigma based analysis and not trigger as anomalous or another custom_algorithm may not trigger them as anomalous, but these level shifts are important and should be alerted on with some metrics. Identifying with metrics or namespaces that can experience significant level shifts is useful to identify and automatically add them to the metrics on which to run custom_algorithms/adtk_level_shift on.
Luminosity as a replacement for Kale Oculus¶
Luminosity is the beginning of an attempt to implement a pure Python replacement for the Oculus anomaly correlation component of the Etsy Kale stack.
Oculus was not moved forward with Skyline in this version of Skyline for a number of reasons, those mainly being Ruby, Java and Elasticsearch. Luminosity is not about being better than Oculus, Oculus was very good, it is about adding additional information about anomalies and metrics.
Luminosity uses some of the functionality of Linkedin’s luminol Anomaly Detection and Correlation library to add this near real time correlation back into Skyline.
Luminosity is different to Oculus in a number of ways, however it does provide similar information.
Differences between Oculus and Luminosity¶
Oculus had a very large architectural footprint. Luminosity runs in parallel to all the other Skyline apps on the same Skyline commodity machine, with very little overhead.
Oculus used a very novel technique to do similarity search in time series using a Shape Description Alphabet: 1. Map line segments to tokens based on gradient. a b c d e 2. Index tokens with Elasticsearch. 3. Search for similar subsequences using sloppy phrase queries. Then search these candidates exhaustively, using Fast Dynamic Time Warping. Oculus had a searchable frontend UI.
Luminosity uses the Linkedin luminol.correlator to determine the cross correlation coefficients of each metric pulling the data directly from Redis only when an anomaly alert is triggered. The resulting cross correlated metrics and their results are then recorded in the Skyline MySQL database and related to the anomaly. Viewable via the Ionosphere and Luminosity UI pages.
Luminosity has the advantage over Oculus of storing the history of metrics, anomalies and their correlations. This gives Skyline a near real time root cause analysis component and over time enables Skyline to determine and report the most strongly correlated metrics.
luminol.correlate is based on http://paulbourke.net/miscellaneous/correlate/ and for the purposes of understanding the luminol.correlate, a pure Python implementation of the Paul Bourke method was implemented and verified with the results of the luminol.correlate.
Running Luminosity on multiple, distributed Skyline instances¶
Please see Running multiple Skyline instances