###### Vortex ###### Skyline Vortex is a service responsible for adhoc analysis of time series data submitted via HTTP POSTs. This allows for adhoc analysis of any time series data, these metrics can be stored anywhere, Skyline does not require access to them, but metrics already in Skyline can be run through Vortex as well. The time series data is posted to Flux with json data that provides details about the metric, the time series data itself and what algorithms to analyse the data with. Although Vortex is accessible via API, there is a webapp UI page to submit local metric time series and csv data as well. Vortex is handled by two services, Flux and Mirage. Requests are submitted to Flux, which are added to a queue for mirage_vortex to process. mirage_vortex processes the items in the queue and adds the results to a queue for Flux to return the results in the request response. A few things to be aware of are: - Timestamps must be unix timestamps and will be coerced into ints. - Values must be ints, floats or nans and will be coerced into floats. - Timestamps and values that cannot be coerced will be removed. - The requesting client should use a timeout of 60 seconds. - The requesting client must follow HTTP status 303 and 302 redirects. - Only one time series can be submitted per request. This is for the sake of simplicity, if multiple metrics/time series were accepted the client would possibly have to parse and process HTTP 207 responses **and** follow any 303 or 302 items returned in the 207 response, which could hold a number of 200, 303 and 302 items. This would make it impossible to use simple HTTP methods such as cURL or python requests, without having to wrap their usage in complicated conditions. Therefore to keep it simple - one time series per request. Preprocessing ============= All data submitted to Vortex is preprocessed. Preprocessing is required to ensure that the data is suitable for algorithmic analysis, clean and consistent. Certain algorithms require that there are no gaps or nans in the data to generate reliable results, others are more tolerant. Best efforts are made to assess the data appropriately per algorithm. Preprocessing also ensures that the shape of the data conforms to additional analysis in the pipeline such as training and matching features profiles, etc. The each preprocessing step that is applied to every data set is described below. There may be additional preprocessing required specific to each algorithm and those preprocessing steps are described in the documentation specific to the algorithm and are not covered here. Coercing -------- Timestamps and values will be coerced into ints and floats respectively. Any data that cannot be coerced into the said types will be discarded from the dataset. Analysis periods ---------------- To allow Vortex analysis and results to be trained on, by default Vortex can handle two periods, 86400 seconds (24 hours) and 604800 seconds (7 days). Data submitted must be either 24 hours worth of data or 7 days worth of data, if you wish to be able to train on it. Any data that is submitted that is greater than 7 days in length will be shortened to start at ``last_timestamp - 604800``. Any data that is submitted that is greater in length than 1 day but less than 7 will **only have the last 24 hour period analysed** and the preceding data will be discarded. Unless, you pass the ``override_7_day_limit`` parameter, then the data can be any length, but it will not be trainable. Downsampling ------------ During preprocessing Vortex will downsample 7 day data to a resolution of 600 seconds and 24 hour data to 60 seconds (if the resolution is < 60). Data is downsampled on the mean and aligned to the end of the dataset, rather than the start of the dataset, which results in the final period have a full sample rather than a partial sample. If your data is high resolution, e.g. a scrape_interval of 10 seconds, try and ensure that you use an appropriate rate/step values when surfacing the data to send. This will result in less bandwidth and time to process than sending your raw high resolution data. Although you can send high resolution data, the downsampling will result in different **exact** values to what your original data may return when a rate/step is applied, which can lead to small inconsistencies/discrepancies which may raise needless questions at some point. **Always try to send 7 days worth of data at a resolution of 600 seconds or 24 hours of data at a resolution of 60 seconds.** Unless you pass the ``no_downsample`` parameter and in this case the data will not be trainable. Strictly increasing monotonic data transformation ------------------------------------------------- Any data submitted that is strictly increasing monotonic data (counters, counts) will be transformed to a non-negative derivative time series (resets discarded). Similar to downsampling, **always try to apply an appropriate rate** to any count based metrics before sending the data rather than sending the raw monotonic data. HTTP 303 and 302s responses =========================== Due to the fact that 100s or 1000s of requests could be made to Vortex at the same time, the expected response times can vary. Under general load (and depending on the algortihms requests) Vortex should return responses within a few seconds, however if the workload is high that time can be much longer. This is why the client needs to use a 60 second timeout on requests. Further to that if Vortex has not processed the request in the defined timeout period, it will issue a HTTP 303 (See other) response with a GET URL location for the client to request to get the result from, the original timeout is automatically added as a URL parameter. The client should follow the 303 and will once again wait for the timeout period for the results response. If the results are not ready after this timeout period, then a HTTP 302 response is issued with the same URL and the retry parameter incremented by 1. This process is repeated 3 times after which if no results are available, a HTTP 200 response will be issued with json response of: .. code-block:: json { "request_id": "", "results": null, "reason": "No results were returned . The request can be retried later, results will be available for 1 hour.", "retry_url": ":///flux/vortex_results?request_id=&timeout=" } The ``retry_url`` will only be present on the final request. Vortex POST data ================ Due to the number of options available the POST data object can be quite complex. Each algorithm has it's own parameters, which if not passed will be set to a sensible default. Sometimes these sensible defaults are calculated from the time series data itself. The basic json POST data has the following structure, the keys in the below example are required. .. code-block:: json { "key": "apikey|str|required", "metric": "metric|str|required", "timeout": seconds|int|required, "timeseries": timeseries|[list,dict]|required, "algorithms": { "sigma": {"consensus": 6}, "spectral_residual": {}, }, "consensus": [["sigma", "spectral_residual"]], "reference": "a reference id for you|str|optional" } Required key value pairs that must be sent in the POST json data are: - ``key``: The flux API token for the namespace. - ``metric``: The metric name. - ``timeout``: The timeout to use in seconds. - ``timeseries``: A list or dict (k/v pairs) of unix timestamps and values. Optional key value pairs are: - ``algorithms``: A dict of the algorithms to run, their parameters and a consensus patterns. ``algorithms`` is covered in detail below. There is a maximum of 3 algorithms that can be passed (unless the mod:`settings.FLUX_SELF_API_KEY` key is used, see below). - ``reference``: This can be a string of a reference you wish to assign to the analysis task. This could be a trace id, a metric name or a time series, it can be anything, but it **must** be cast as a string. Therefore if it is an int or float it must be passed as ``"1987"`` or ``"1669399450.337939"``. - ``no_downsample``: Allows analysis to be run without downsampling the data, **be advised no training_data is saved for these requests**, only results are returned. - ``override_7_day_limit``: Allows to override the requirement for 24h or 7d data, **be advised training_data is not suitable for training with on these requests** Using the mod:`settings.FLUX_SELF_API_KEY` key allows for additional parameters to be passed and can be used to remove the maximum algorithm limit of 3. Additional keys that can be passed with this key are: - ``save_training_data_on_false``: A boolean that enables the saving of training data even if the instance is not found to be anomalous. - ``metric_namespace_prefix``: A string of the namespace prefix to use or a ``None`` boolean value. - ``shard_test``: A boolean that allows for run a shard test while Skyline is running in a clustered mode. No analysis is done, the requests is just distributed to correct cluster node which returns a response indicating that it received and would process the request. ``algorithms`` -------------- DOCUMENTATION STILL UNDER DEVELOPMENT The ``algorithms`` object is the most complicated so it is covered in detail here. It consists keys for each algorithm to be run with any algorithm parameters defined for each algorithm. If no algorithm parameters are passed e.g. ``{}`` then the default parameters for the algorithm will be used. Along with these algorithm definitions there is a ``consensus`` key with can hold a list of lists to define what combinations of triggered algorithms will be classed as an anomaly. For example below we define that both the ``sigma`` and ``spectral_residual`` algorithms must be triggered to be classed as an anomaly. .. code-block:: json-object { "key": "0123456789abcdefghijlmnopqrstu", "metric": "prometheus_http_requests_total{alias=\\"Prometheus\\", code=\\"200\\", handler=\\"/api/v1/query_range\\", instance=\\"localhost:9090\\", job=\\"prometheus\\"}", "timeout": 60, "timeseries": [[1668689060, 23.3], ..., [1668689120, 162.9]], "algorithms": { "sigma": {"sigma_value": 3, "consensus": 6}, "spectral_residual": {}, }, "consensus": [["sigma", "spectral_residual"]], "reference": "128186f730da2d9e" } Above we define that the ``sigma`` algorithms must achieve a consensus of 6 e.g. 6 of the 9 sigma algorithms must trigger for ``sigma`` to be classed as anomalous. We also define that ``spectral_residual`` should be run with it's default settings ``{}``. In the overall ``consensus`` key we define that **both** ``sigma`` and ``spectral_residual`` **must** trigger to class the analysis as anomalous. Should you wish to have an anomalous classification based on any of the algorithms triggering, you can specify the overall ``consensus`` as: .. code-block:: json-object "algorithms": { "sigma": {"sigma_value": 3, "consensus": 6}, "spectral_residual": {}, }, "consensus": [["sigma"], ["spectral_residual"]], This would mean that if either one of the algorithms triggered, the instance would be classed as anomalous. It is possible to not specify the ``algorithms`` key and defaults will be used. Currently the defaults are ``sigma`` and ``spectral_residual``, however if a better combination of algorithms is discovered this can change in future versions. ``anomaly_window`` ^^^^^^^^^^^^^^^^^^ A number of the algorithms have a special algorithm parameter of ``anomaly_window``. This parameter allows one to specify a window size in which to classify an instance as anomalous. Basically this allows you to configure the algorithm to check if any value in the last x values are anomalous, rather than just checking the last value. If present in an algorithm, by default the ``anomaly_window`` is 1. Skyline generally only determines if the final value in a time series is anomalous related to the rest of the time series, however, due to the nature of Vortex, it may help the set the ``anomaly_window`` to a number of data points. Because Vortex is adhoc analysis, the methods you use to decide whether to analyse a time series may be lagged, meaning by the time your analysis/alerter has decided something should be further assessed and surfaces the data, the time series may already have changed. Perhaps you see a value of 800 and your normal values are between 10 and 30, by the time you surface the data to send to Vortex the last value may have updated to say 22, using the default ``anomaly_window`` the time series may be classified as not anomalous, because the value of 22 is being used as the decider. Having an ``anomaly_window`` of say 5 and say the following values would be evaluated ``17, 14, 24, 800, 22`` and the time series would be classified as anomalous. **It is important to consider this window in the context of any downsampling that may be applied to the data.** For example if you are sending 7 days of data at a resolution of 5 seconds (because you have decided to not following the advice on downsampling above) and you wanted to check for anomalies in the last 10 minutes, the ``anomaly_window`` will be 1 not 40 when downsampling is applied. Available algorithms ^^^^^^^^^^^^^^^^^^^^ The currently the following algorithms are available to use with Vortex, each listed here will be handled in detail regarding their individual ``algorithm_parameters`` which are available. These may be subject to or removal at some point in the future: - ``default`` - ``sigma`` - Collection of the original Skyline 3sigma algorithms - ``dbscan`` - Density-based spatial clustering of applications with noise - ``lof`` - Local Outlier Factor - ``one_class_svm`` - One Class SVM - ``pca`` - Principal Component Analysis - ``prophet`` - the fbprophet algorithm (long running and not suited to realtime analysis) - ``spectral_residual`` - Spectral Residual - ``isolation_forest`` - Isolation Forest - ``m66`` - A skyline changepoint detection algorithm, similar to PELT, ruptures and Bayesian Online Changepoint Detection, however it is more robust to instaneous outliers and more conditionally selective of changepoints. - ``adtk_level_shift`` - ADTK LevelShiftAD algorithm - ``adtk_persist`` - ADTK PersistAD algorithm - ``adtk_seasonal`` - ADTK SeasonalAD algorithm - ``adtk_volatility_shift`` - ADTK VolatilityShiftAD algorithm - ``macd`` - Moving Average Convergence/Divergence - ``spectral_entropy`` - Spectral Entropy - ``mstl`` - statsforecast MSTL algorithm (the mstl algorithm is very long running and not suited for realtime analysis) - ``lad`` - Large Deviations Anomaly Detection - ``probabilistic_forecasts_generalized_pareto_distribution_ets`` - Probabilistic forecasts for anomaly detection using Generalized Pareto Distribution (long running and not suited for realtime analysis) The algorithms are run in the order in which they are declared and the analysis will stop before running all algorithms, if a consensus is reached before all the algorithms are run or if consensus cannot be reached. To determine the various parameters that can be passed for each algorithm and understand what those parameters do, please refer to the algorithm source code in skyline/skyline/custom_algorithms/ algorithm py file. DOCUMENTATION STILL UNDER DEVELOPMENT