.. role:: skyblue
.. role:: red

Luminosity
==========

Requires - Panorama to be enabled and running.  It is enabled and configured
via the ``LUMINOSITY_`` variables in ``settings.py``.

Luminosity handles correlation and classification of anomalies and metrics.
Luminosity also learns what metrics are related to each other using all the
previous discovered correlations data for metrics.

Luminosity correlation
----------------------

Luminosity takes the time series of an anomalous metric and then cross
correlates the time series against all the other metrics (or correlation mapped
namespaces metrics, see :mod:`settings.LUMINOSITY_CORRELATE_NAMESPACES_ONLY` and
:mod:`settings.LUMINOSITY_CORRELATION_MAPS`) time series to find and record
metrics that are strongly correlated to the anomalous metric, using Linkedin's
Luminol library.

This information can help in root cause analysis and gives the operator a view
of other metrics, perhaps on different servers or in different namespaces that
could be related to anomalous event.

It handles metrics having a time lagged effect on other metrics this is handled
with time shifted cross correlations too.  So that if ``metric.a`` did
something 120 seconds ago and ``metric.b`` become anomalous 120 seconds later,
if there is correlation between the time shifted time series, these will be
found as well and recorded with time shifted value e.g. -120 and a
shifted_coefficient value.

Cross correlations are only currently viewable in Ionosphere on training_data
and features profile pages, however you can query MySQL directly to report on
any :mod:`settings.ALERTS` anomalies ad hoc until a full Luminosity webapp page
is created.

These are the correlations from the entire metric population (or or correlation
mapped namespaces metrics) that fall within
:mod:`settings.LUMINOL_CROSS_CORRELATION_THRESHOLD`, they are not necessarily
all contextually related, but the contextually related correlations are listed
for the anomaly.  These are simply the numeric correlations it is currently
up to the operator to review them.  As it is assumed that the operator may know
which metrics are likely to correlate with this anomaly on the specific metric,
until such a time as Skyline can do so on its own (something on the roadmap).

At the current stage of development Luminosity adds lots of noise along with the
signals in the correlations, somewhat similar to the original Analyzer.
This is just the start, to be able to try and make it better and useful, the
data is needed first.

Luminosity classifications
--------------------------

Luminosity can also be enabled to classify metrics and time series.
There are a number of classifications the luminosity/classify_metrics can be
enabled to do.  These different types of classifications are required to enable
certain functionalities.

Enabling :mod:`settings.LUMINOSITY_CLASSIFY_METRICS_LEVEL_SHIFT` results in
identifying metrics that can experience significant level shifts using the `adtk`_
`LevelShiftAD`_ and `PersistAD`_ algorithms.  When this classification is enabled
luminosity/classify_metrics iterates through all the metrics (once every 4 hours)
and runs LevelShiftAD and PersistAD algorithms against the previous 7 days of
data for the metric and identifies any significant level shifts.  Both algorithms
are configured with ``c=9.9`` to ensure as few false positives as possible.
`PersistAD`_ is evaluated if `LevelShiftAD`_ triggers and a significant
level shift will only be identified if both `LevelShiftAD`_ anomalies and
`PersistAD`_ anomalies align, this identifies a true persisted level shift was
experienced.  When first enabled luminosity/classify_metrics checks each 7 day
window in the last 6 months of data for each metric, to identify if any historic
level shifts were experienced. Thereafter each metric will only have the last 7
day window checked per 4 hours.  Once a metric has been identified as level
shift metric, the level shift classification will not been run again.  This is
also true if the metric is in a known or configured namespace.

The purpose of identifying metrics that can experience genuine level shifts, is
to feed them back into custom_algorithms/adtk_level_shift for level shift
analysis at runtime.  There are certain types of metrics that could level shift
without the bounds of the three-sigma based analysis and not trigger as
anomalous or another custom_algorithm may not trigger them as anomalous, but
these level shifts are important and should be alerted on with some metrics.
Identifying with metrics or namespaces that can experience significant level
shifts is useful to identify and automatically add them to the metrics on which
to run custom_algorithms/adtk_level_shift on.

.. _adtk: https://github.com/arundo/adtk
.. _LevelShiftAD: https://adtk.readthedocs.io/en/stable/api/detectors.html#adtk.detector.LevelShiftAD
.. _PersistAD: https://adtk.readthedocs.io/en/stable/api/detectors.html#adtk.detector.PersistAD


Luminosity as a replacement for Kale Oculus
-------------------------------------------

Luminosity is the beginning of an attempt to implement a pure Python replacement
for the Oculus anomaly correlation component of the Etsy Kale stack.

Oculus was not moved forward with Skyline in this version of Skyline for a
number of reasons, those mainly being Ruby, Java and Elasticsearch.
Luminosity is not about being better than Oculus, Oculus was very good, it is
about adding additional information about anomalies and metrics.

Luminosity uses some of the functionality of Linkedin's luminol Anomaly
Detection and Correlation library to add this near real time correlation back
into Skyline.

Luminosity is different to Oculus in a number of ways, however it does provide
similar information.

Differences between Oculus and Luminosity
-----------------------------------------

Oculus had a very large architectural footprint.
Luminosity runs in parallel to all the other Skyline apps on the same Skyline
commodity machine, with very little overhead.

Oculus used a very novel technique to do similarity search in time series using
a Shape Description Alphabet:
1. Map line segments to tokens based on gradient. a b c d e
2. Index tokens with Elasticsearch.
3. Search for similar subsequences using sloppy phrase queries.
Then search these candidates exhaustively, using Fast Dynamic Time Warping.
Oculus had a searchable frontend UI.

Luminosity uses the Linkedin luminol.correlator to determine the cross
correlation coefficients of each metric pulling the data directly from Redis
only when an anomaly alert is triggered.  The resulting cross correlated metrics
and their results are then recorded in the Skyline MySQL database and related to
the anomaly.  Viewable via the Ionosphere and Luminosity UI pages.

Luminosity adds
---------------

Luminosity has the advantage over Oculus of storing the history of metrics,
anomalies and their correlations.  This gives Skyline a near real time root
cause analysis component and over time enables Skyline to determine and report
the most strongly correlated metrics.

luminol.correlate is based on http://paulbourke.net/miscellaneous/correlate/
and for the purposes of understanding the luminol.correlate, a pure Python
implementation of the Paul Bourke method was implemented and verified with the
results of the luminol.correlate.

Luminosity related_metrics
--------------------------

Luminosity can also be enabled to analyse what metrics relate to what other
metrics.  The resulting metric_group data can then be used to for further
analysis and information/categorisation for the user.

It is important to understand that metrics are only related to other metrics
when there is **sufficient correlations** and **meaningful** data to classify
the metrics as related. This point must be stressed to ensure that the relevant
`settings.LUMINOSITY_RELATED_METRICS_*` are set to realistic values.  The
cross correlations algorithm used by Luminosity in a large, high frequency
metric population finds a lot of correlations which are often of a coincidental
nature and of numerical significance only, there is no contextual aspect.  The
related_metrics analysis is therefore designed to use strict configurations to
terms of settings ensuring that only the high confidence metrics are grouped
together.

This means metric groups take time for the system to learn because there needs
to be a history of data to work with.  This grows over time.  It may be possible
to speed up that process with the addition of something like JumpStarter
(https://github.com/NetManAIOps/JumpStarter) which looks to be a promising
new method of multivariate time series anomaly detection approach based on
Compressed Sensing (CS), which would at least decrease the time learn and live.

For now it is based on Skyline's own history and even when other methods are
introduced, these Skyline metric groups of related_metrics is a dynamic and
fully relevant source of truth based on past observations, allowing Skyline to
infer what metrics are related.  Even if the metric groups are not 100%
representative **all** the metrics that have been related to a metric, they are
descriptive enough to represent a large proportion of the most strongly related
metrics (enough so to say "these" metrics are related and create relevant
multivariate groups).

The related_metrics process runs as a low priority process and
creates and updates metric_group data structures as and when the Skyline
instance is not heavily loaded.  The features of these metric groups are
stored in the database as avg_coefficient, shifted_counts and
avg_shifted_coefficient values.  These metric groups are **dynamic** and can and
do change as the system and it metric population run over time.  There is also
an unordered list (not sorted by any feature) of metrics in each metric_group
stored in Redis.

Running Luminosity on multiple, distributed Skyline instances
-------------------------------------------------------------

Please see `Running multiple Skyline instances <running-multiple-skylines.html>`__