=================
Custom algorithms
=================

**py3 only**

This section describes the process, steps and resources required to run custom
algorithms in Skyline. Adding a custom algorithm or algorithms is easier and
better than modifying the core Skyline algorithm files yourself.

If you are wanting to implement a custom algorithm, ensure you read this
documentation **thoroughly** and **understand the layout** in the example
algorithms.  This will save you time if you do it properly from the beginning.

Custom algorithms can be used in analyzer, analyzer_batch, crucible, mirage,
Vortex and SNAB.  The methods by which to enable and use custom_algorithms in
the different Skyline apps varies according to the app/use.

Available custom algorithms
~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is a collection of custom algorithms already available in Skyline and each
has a page here.  However currently most are only documented in the code and their
pages refer to the source code.

The performance and accuracy of the available algorithms vary, their inclusion is
not a validation of the method, simply that they can function as outlier detectors.
Each generates different results and few seldom agree.  The Vortex webapp UI can be
used to run them adhoc on any time series should you wish to assess there performance.

Implementing a custom algorithm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To implement a custom algorithm, you need to define it in
:mod:`settings.CUSTOM_ALGORITHMS` and add the Python source custom algorithm
file.

.. warning:: **A note on speed**, bear in mind that any custom algorithms added
  have to run **FAST**, otherwise analysis stops being real time and the
  Skyline apps will terminate their own spawned processes if they run too long.
  Consider that Skyline's three-sigma triggered algorithms take on average
  0.0023 seconds to run and all 9 are run on a metric in about 0.0207 seconds.
  Adding any algorithms that run substantially slower is **not** recommended,
  even if it is on a small set of metrics.  Any algorithms added to should be as
  computationally efficient as possible and suitable for processing real time
  streaming data, e.g. O(n).  This is especially true for any custom algorithms
  added to Analyzer.  This is not a hard requirement, just a recommendation.
  It is possible to add non O(n) algorithms to Mirage, but they should be
  designed and implemented to be as computationally efficient as possible.
  If you break Skyline with your own custom algorithms, be that on your head.

Custom algorithms can be run **before** or **after** the core three-sigma
algorithms. The custom algorithm can be configured to run before the three-sigma
algorithms which allows the user to disable the running of three-sigma
algorithms on namespaces if they so desire, more on this below.  By default
custom algorithms are run **before** three-sigma algorithms.
Running the custom algorithm **after** three-sigma allows the user to only run
a custom algorithm if three-sigma achieves :mod:`settings.CONSENSUS` or run the
custom algorithm regardless of the CONSENSUS that was achieved.

The custom algorithms settings are therefore highly configurable, specifically
take note of the consensus, run_3sigma_algorithms, run_before_3sigma and
run_only_if_consensus parameters, especially if you are attempting to run
multiple custom algorithms at different stages, before or after three-sigma and
what consensus should be applied.  It is possible to configure poorly given the
methods and modes that can be configured.

:mod:`settings.CUSTOM_ALGORITHMS`
---------------------------------

Custom algorithms are only available in analyzer, crucible and mirage, any
declared for use with analyzer will automatically also be applied for
analyzer_batch.  For a custom_algorithm to be available for use in crucible
it must be declared in CUSTOM_ALGORITHMS.


Custom algorithms are defined in the :mod:`settings.CUSTOM_ALGORITHMS`
dictionary.  The format and key values of the dictionary are shown in the
following **example**:

.. code-block:: python

    CUSTOM_ALGORITHMS = {
        'abs_stddev_from_median': {
            'namespaces': ['telegraf.cpu-total.cpu.usage_system'],
            'algorithm_source': '/opt/skyline/github/skyline/skyline/custom_algorithms/abs_stddev_from_median.py',
            'algorithm_parameters': {},
            'max_execution_time': 0.09,
            'consensus': 6,
            'algorithms_allowed_in_consensus': [],
            'run_3sigma_algorithms': True,
            'run_before_3sigma': True,
            'run_only_if_consensus': False,
            'trigger_history_override': 0,
            'use_with': ['analyzer', 'analyzer_batch', 'mirage'],
            'debug_logging': False,
        },
        'last_same_hours': {
            'namespaces': ['telegraf.cpu-total.cpu.usage_user'],
            'algorithm_source': '/opt/skyline/github/skyline/skyline/custom_algorithms/last_same_hours.py',
            # Pass the argument 1209600 for the sample_period parameter and
            # enable debug_logging in the algorithm itself
            'algorithm_parameters': {
              'sample_period': 604800,
              'debug_logging': True
            },
            'max_execution_time': 0.3,
            'consensus': 6,
            'algorithms_allowed_in_consensus': [],
            'run_3sigma_algorithms': True,
            'run_before_3sigma': True,
            'run_only_if_consensus': False,
            'trigger_history_override': 0,
            # This does not run on analyzer as it is weekly data
            'use_with': ['mirage', 'crucible'],
            'debug_logging': False,
        },
        'detect_significant_change': {
            'namespaces': ['swell.buoy.*.Hm0'],
            # Algorithm source not in the Skyline code directory
            'algorithm_source': '/opt/skyline_custom_algorithms/detect_significant_change/detect_significant_change.py',
            'algorithm_parameters': {},
            'max_execution_time': 0.002,
            'consensus': 1,
            'algorithms_allowed_in_consensus': ['detect_significant_change'],
            'run_3sigma_algorithms': False,
            'run_before_3sigma': True,
            'run_only_if_consensus': False,
            'trigger_history_override': 0,
            'use_with': ['mirage'],
            'debug_logging': True,
        },
        'skyline_matrixprofile': {
            'namespaces': ['*'],
            'algorithm_source': '/opt/skyline/github/skyline/skyline/custom_algorithms/skyline_matrixprofile.py',
            'numba_cache_dirs': ['stumpy_'],
            'algorithm_parameters':  {'windows': 5, 'k_discords': 20},
            'max_execution_time': 5.0,
            'consensus': 1,
            'algorithms_allowed_in_consensus': ['skyline_matrixprofile'],
            'run_3sigma_algorithms': True,
            'run_before_3sigma': False,
            'run_only_if_consensus': True,
            'trigger_history_override': 4,
            'use_with': ['mirage'],
            'debug_logging': False,
        },
    }

Within the dictionary each custom algorithm is declared and its variables are
defined.  Each custom algorithm defined is required to adhere to the following
requirements.

- **algorithm_name**: firstly and importantly, name of algorithm must be simple,
  unbroken, alphanumeric string.  It **must** also be the name of the main
  algorithm function, this is because it is loaded by ``importlib`` and the
  name in the dictionary is used to load the custom algorithm at runtime.
- ``namespaces``: this is a list of the namespaces you want to run the custom
  algorithm against.  These can be absolute metric names, substrings or dotted
  elements of a namespace or a regex of a namespace.
- ``algorithm_source``: the full path to the custom algorithm Python file, the
  file can be deployed to any directory it does not need to be in the same path
  as the Skyline code, just ensure the user running the Skyline process has read
  permissions on the path and file itself.
- ``algorithm_parameters`` - this is a dictionary of any parameters/arguments
  that you want to pass to your algorithm.  Your custom algorithm will need to
  interpolate your parameters/arguments (key/value) from this dictionary. If
  none are required simply use an empty dict `{}`.
- ``max_execution_time`` - a float (and read the warning about speed above).
- ``consensus`` - this allows you to add your algorithm to the ``CONSENSUS`` or
  override ``CONSENSUS`` by setting this to 1.  If you are running
  ``CONSENSUS = 6`` and wanted to just add your custom algorithm as an addition
  to the normal three-sigma algorithms, you would just pass ``'consensus': 6`` or
  ``'consensus': 7`` depending on what you want.  The only other option currently
  is to **override** the ``CONSENSUS``, if you want an anomaly triggered every
  time your custom algorithm triggers, regardless of three-sigma ``CONSENSUS`` then
  set ``'consensus': 1``
- ``algorithms_allowed_in_consensus``: must be passed but is **not implemented yet**
  but this is a list of algorithms that must have triggered for consensus to be
  achieved. If an empty list is passed `[]` this will be ignored and normal
  ``CONSENSUS`` will be used.
- ``run_3sigma_algorithms``: a boolean stating whether to run the normal three-sigma
  algorithms, this is optional and defaults to ``True`` if it is not passed
  in the dictionary.  **NOTE** - If any custom algorithm is run that has this
  set to ``False`` no three-sigma algorithms will be run regardless of what any
  other custom algorithms are set to.  If multiple custom algorithms are being
  run and only 1 has this set to ``False`` it will be applied to all.
- ``run_before_3sigma``: a boolean stating whether to run the custom algorithm
  before the normal three-sigma algorithms, this defaults to ``True``.  If you
  want your custom algorithm to run after the three-sigma algorithms set this to
  ``False``.
- ``run_only_if_consensus``: a boolean stating whether to run the custom
  algorithm only if CONSENSUS or MIRAGE_CONSENSUS is achieved, it defaults to
  ``False``.  This only applies to custom algorithms that are run after
  three-sigma algorithms, e.g. with the parameter ``run_before_3sigma: False``
  Currently this parameter only uses the CONSENSUS or MIRAGE_CONSENSUS setting
  and does not apply the consensus parameter above.
- ``trigger_history_override``: an int defining whether override the outcome of
  the custom algorithm if the three-sigma algorithms have triggered this many
  times in a row.  Setting this to 0 disables the override and the number of
  times the three-sigma algorithms have triggered is not checked.  If this value
  is set to 4 then even if the custom algorithm evaluates the metric as not
  anomalous, if the metric has been determined to be anomalous by the three-sigma
  analysis 4 times in a row, the custom algorithm result will be overridden and
  the metric will be classified as anomalous.
- ``use_with`` - a list of the Skyline apps that should apply the custom
  algorithm.  All the apps can be declared but they will only apply the custom
  algorithm **if** they actually handle the metric.  Simply declaring them in
  the list does not mean that the app will just automatically run them all the
  time.  If the app does not handle the metric, it being declared makes no
  difference, therefore if you are unsure, it is safe to list them all.
  Although do **note** that if your custom algorithm needs more data than
  :mod:`settings.FULL_DURATION` then do not specify ``'analyzer'``
  as apps to run the custom algorithm with.
- ``debug_logging``: a boolean to enable debug_logging, which wraps the custom
  algorithm run in a bit of additional logging, regarding timings, etc this is
  useful for development and testing.  In general use and production this should
  always be set to ``False``.

It is also possible to set :mod:`settings.DEBUG_CUSTOM_ALGORITHMS` to ``True``
and this enables debug logging on all custom algorithms, regardless of what
their ``debug_logging`` is set to.  However if this is set to ``False`` debug
logging can still be implemented on each custom_algorithm individually using
``'debug_logging': True,`` in the algorithm item in
:mod:`settings.CUSTOM_ALGORITHMS`.

The custom algorithm file
-------------------------

Although any Python code can be added to a custom algorithm file, the algorithm
file must meet some basic requirements that are required to properly integrate
and be run by Skyline.

Below the requirements are outlined, please read them and you can refer to a
couple of example custom algorithm files in the skyline/custom_algorithms
directory of the repo.  https://github.com/earthgecko/skyline/tree/master/skyline/custom_algorithms

.. warning:: Do remember if the algorithm has requirements that are not declared
  in Skyline's requirements.txt file, ensure that you install the algorithm's
  requirements in the Skyline virtualenv.

General purpose algorithms only
-------------------------------

In general a custom algorithm should be able to be applied to all time series
data with the same parameters, unless the custom algorithm determines the
best parameters to use from the data itself during execution.

JSON friendly results **ONLY**
------------------------------

Although a custom algorithm does not need to return a results ``dict``, it can.
custom_algorithms can be used in various ways and internally Skyline makes use
of them in a number of ways, therefore a custom algorithm can return a verbose
results ``dict`` that is used elsewhere in the pipeline.  The results ``dict``
can hold any data but the data **MUST** be in a JSON friendly format, no ``nan``
or ``np.nan`` values, no ``None`` values, no ``np.array``, no ``np.int64``.
There can be no types.  JSON only accepts, ints, floats and str and ``True``
and ``False`` boolean values.  Therefore if you want your algorithm to return
a results ``dict`` be sure that you ensure every output is validated to be JSON
friendly.

int and floats **ONLY** and no nans
-----------------------------------

If your custom algorithm has the ability to also ``return_results`` or
``return_anomalies`` be advised that you need to ensure that the results are
coerced to int and float types only.  This is due to the fact the results can be
moved through the pipeline and saved as json.  Therefore results from any numpy
arrays could have type ``int64`` or other which are not JSON serializable.

The same is true for ``nan`` values, although floats in Python they are not
valid in JSON, ensure ``nan`` values are coerced to ``None`` types which json
will dump to ``null``.

``anomalyScore``
~~~~~~~~~~~~~~~~

Unlike the core Skyline algorithms, custom algorithms introduces the requirement
for the algorithm to also return a ``anomalyScore``.  The concept of the
``anomalyScore`` is used in many anomaly detection algorithms and methods and it
is useful in many cases for algorithm testing.

Test for unreliable results
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Consider having a test in your custom algorithm, after it has run to check that
the results are reliable.  Although algorithms may run fine against almost all
the different types of time series data with a single set of parameters, with
many algorithms there is always a chance that the algorithm may identify most or
lots of the datapoints as anomalous.  If the custom algorithm returns say >= 20%
of the time series data points as anomalous, consider the results unreliable. 

Custom algorithm requirements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Must be written in Python
- Must import all modules and classes it requires.
- The algorithm must have the following four parameters, e.g.

.. code-block:: python

    def last_same_hours_weekly(current_skyline_app, parent_pid, timeseries, algorithm_parameters):

- The four parameters are:

  - ``current_skyline_app`` - this will be passed to the custom algorithm by
    Skyline to identify which Skyline app is executing the algorithm, this is
    **required** for error handling and logging.  You do not have to worry about
    handling the ``current_skyline_app`` argument in your algorithm, your
    algorithm must just accept it as the first argument.
  - ``parent_pid`` - this will be passed to the custom algorithm by
    Skyline to identify which pid has executed the algorithm, this is
    **required** for error handling and logging.  You do not have to worry about
    handling the ``parent_pid`` argument in your algorithm, your algorithm must
    just accept it as the second argument.
  - ``timeseries`` - the algorithm must accept a time series as a list e.g.
    ``[[1578916800.0, 29.0], [1578920400.0, 55.0], ... [1580353200.0, 55.0]]``
  - ``algorithm_parameters`` - this is a dictionary of any of parameters that
    the algorithm requires.

- Your algorithm should be a simple single function, see the example algorithms
  for guidance.  It is possible that a multi classed algorithm could work, but
  your mileage may vary.  This method is only tested with the algorithm being a
  simple function.
- The custom algorithm must return a boolean to state whether the data point is
  anomalous **and** a ``anomalyScore`` at the minimum, e.g.

.. code-block:: python

    # return (anomalous, anomalyScore)
        return (True, 1.0)
    return (False, 0.2)

- The returned boolean must be one of the following three choices:

  - ``True`` - the data point **is** anomalous
  - ``False`` - the data point **is not** anomalous
  - ``None`` - returned when the algorithm could not determine ``True`` or
    ``False``, an error occurred or there was no data, the results were not
    reliable, etc.

- The returned ``anomalyScore`` must be a **float** between 0.0 and 1.0, 0.0
  being not anomalous and 1.0 being a certain anomaly.  You can pass
  `(False, 0.7)`,  you just have to normalise your ``anomalyScore`` between 0.0
  and 1.0.  The ``anomalyScore`` is currently only for testing it is not used in
  any way but it **must** be returned.  The anomalous classification is
  currently **only** determined from the boolean and the ``anomalyScore`` is
  currently not used in any way other than for testing.  If your algorithm does
  not calculate an anomaly score, when your algorithm returns ``False`` just
  return it with a 0.0 and when your algorithm returns ``True`` just return it
  with 1.0

Additional results in the return
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

At minimum your algorithm must return ``(anomalous, anomalyScore)``, however if
you wish to use the custom algorithm with Vortex for testing then you will need
to return results as well.

The results object is dictionary of items that Vortex needs to apply consensus
to, plot results, etc.

.. code-block:: python

    results = {
        'anomalous': anomalous,  # boolean
        'anomalies': anomalies,  # dict (at minimum must be keyed with timestamps with value, index and score, additional keys can be added)
        'some_algo_specific_thing': json_friendly_object,  # str,list,dict (no nans)
        'anomalyScore_list': anomalyScore_list,  # list of 0 and 1s
        'scores': algorithm_scores,  # list of any algorithm specific scores (no nans use None)
        'unreliable': unreliable,  # boolean
        'unreliable_reason': unreliable_reason,  # str
        'success': success,  # boolean
    }

All values in the dict keys **MUST BE** JSON friendly, this means convert all
``nan`` to ``None`` which json will reflect as ``null``.  Convert any arrays to
lists, etc.

Below is an example of how the objects of each key should be defined.  Note how
the timestamps are typed as str because you cannot use an int for a key in json,
a json key must be a str.

.. code-block:: python

    results = {
        'anomalous': True,
        'anomalies': {
            '1686585000': {'value': 6.944444444444445, 'index': 865, 'score': 4},
            '1687449000': {'value': 0.375, 'index': 2305, 'score': 4},
            '1687508400': {'value': 0.7, 'index': 2404, 'score': 4},
            '1687509000': {'value': 0.8, 'index': 2405, 'score': 4},
            ...,
            '1688647800': {'value': 1.1, 'index': 4303, 'score': 4, 'triggered': ['lof', 'isolation_forest', 'one_class_svm', 'm66']}
        },
        'some_algo_specific_thing': json_friendly_object,  # str,list,dict (no nans)
        'anomalyScore_list': [0, 0, 0, ..., 0, 1],
        'scores': [0, ..., 4, ..., 0, 4],
        'unreliable': True,
        'unreliable_reason': 'None',
        'success': True,
    }


Error handling
~~~~~~~~~~~~~~

In the example algorithms there are examples of how to wrap your algorithm in
normal Skyline algorithm exception handling method.  Although you can implement
your own logging in a custom algorithm, before you do, consider using the method
described in the example algorithms, because the algorithms iterate over 1000s
of time series every minute, logging all errors that are encountered in the
developing or running of an algorithm is not practical (due to simply I/O) or
desired.  To accommodate error logging from algorithms, Skyline's error handling
method writes out any errors to a single file per algorithm during the analysis
phase, overwriting the file with each error.  The errors files are handled in
the /tmp directory which is normal memory based tmpfs resources so no disk I/O
is encountered.  Once analysis is complete, the parent process checks for any
algorithm error files and logs any errors found to the main application log
once.  As shown in the example below.

.. code-block:: none

    2020-06-07 05:47:40 :: 12856 :: error :: spin_process with pid 12870 has reported an error with the abs_stddev_from_median algorithm
    2020-06-07 05:47:40 :: 12856 :: Traceback (most recent call last):
      File "/opt/skyline/github/skyline/skyline/custom_algorithms/abs_stddev_from_median.py", line 46, in abs_stddev_from_median
        make_an_error = median * UNDEFINED_VARIABLE
    NameError: name 'UNDEFINED_VARIABLE' is not defined

This allows for errors to be encountered while not spewing 1000s and 1000s of
lines of errors to disk based the application logs and incurring masses of I/O.

The use of numba optimisations
------------------------------

Any performance optimisations that can be achieve in custom algortihm with numba
jit functions is encouraged, with the following caveats.

If the algorithm or any part thereof uses numba optimisations in anyway ensure
that the ``jit`` or ``njit`` decorator has ``cache=True`` applied to ensure that
the jit compilation overhead is only incurred once and is not incurred on every
execution.  In terms of multiprocessing and threading if there is no jit cache
file of a function the jit compilation happens every execution which is slow.

Also note that in the normal Skyline build on CentOS 8 the default
``NUMBA_CACHE_DIR`` is ``/opt/skyline/.cache/numba`` this is not specifically
defined, it is what gets defaulted to by numba.  **IMPORTANT** as per relevant
upgrade instructions in release notes, this directory needs to be flushed or
preferably just moved to back up the existing cache whenever an upgrade is made
to Python and/or any dependencies, so that the numba jit cache files can be
recompiled with the new versions.

In terms of making changes to any existing numba optimised custom algorithms
that are loaded by Skyline, numba will **automatically** replace it jit cache
files when a change is made to the algorithm source file in Skyline when the
relevant Skyline service is restarted and the custom algorithm function is
called for the first time.

That said this has a direct impact on the custom algorithm ``max_execution_time``
parameter.  This is because the first execution of the jit function when the
cache files are not present could take say 40 seconds to compile the function.
Now if you can expect the algorithm to run in say less than 1 second WHEN the
jit cache file is present, you may want to set the ``max_execution_time: 1.2,``
however the custom algorithm will be terminated by the timeout decorator on
**every** run and it will never compile and therefore never save the jit cache
files.

Therefore an additional key value pair can be passed in the
:mod:`settings.CUSTOM_ALGORITHMS` definition for any custom algorithm that uses
numba jit functions, e.g. ``'numba_cache_dirs': ['stumpy_'],``.  This is a list
of cache dirs that are expected to exist and if do not exist then increase the
``max_execution_time`` to 300 seconds to allow for first time numba jit
compilations to occur so that the timeout decorator does not terminate the
algorithm due to a low max_execution_time.

The ``numba_cache_dirs`` is a list of substrings of the directories created in
the default ``/opt/skyline/.cache/numba`` which are expected.

For example the stumpy stump algorithm with njit caching enabled saves
``/opt/skyline/.cache/numba/stumpy_7b65d2f0242c0368bd086e515d849481810e817d`` with nbc and nbi files:

.. code-block::

  (skyline-py3816) [root@skyline-test-1 skyline-py3816] ls -al /opt/skyline/.cache/numba/ | grep stump
  drwxr-xr-x  2 skyline skyline 4096 Jan 12 16:37 stumpy_7b65d2f0242c0368bd086e515d849481810e817d
  (skyline-py3816) [root@skyline-test-1 skyline-py3816] ls -al /opt/skyline/.cache/numba/stump*
  total 460
  drwxr-xr-x 2 skyline skyline   4096 Jan 12 16:37 .
  drwxr-xr-x 8 skyline skyline   4096 Jan 12 16:09 ..
  -rw-rw-rw- 1 skyline skyline 455583 Jan 12 16:37 stump._stump-216.py38.1.nbc
  -rw-rw-rw- 1 skyline skyline   1487 Jan 12 16:37 stump._stump-216.py38.nbi
  (skyline-py3816) [root@skyline-test-1 skyline-py3816]


This would be declared as ``'numba_cache_dirs': ['stumpy_']``, and the
``custom_algorithms.run_custom_algorithm_on_timeseries`` checks for the
existence of a dir matching these substrings/patterns in
``/opt/skyline/.cache/numba``, if not found sets the ``max_execution_time`` to
300 seconds for that run to allow for the jit compile and save overhead without
terminating the run before it has time to compile and save.

You therefore need to be fully aware of what jit cache files your algorithm will
create and declare them appropriately.

flux/tornado
------------

ADVANCED_FEATURE

Due to a numba of factors (pun intended) running numba jit compiled algorithms in
custom_algorithms can introduce some performance issues in terms of load times.
The dynamic nature of custom_algorithms and due to the use of multiprocessing,
every call to a custom_algorithm is dynamic in terms of the process handling it
and therefore the process will need to load and initial all the algorithms and
deps when it is instantiated, if the process has been assigned multiple checks
then the first check will incur the load time, every check thereafter will
benefit from the cached and loaded functions.  This means that the initial load
for a process can take a few seconds, depending on the custom_algorithm, it's
complexity and deps.  This is not desirable in most cases, especially in Mirage.
Consider a normal Mirage workflow, where say 10 checks are sent to Mirage, let
us say Mirage is configured with 2 processes.  Mirage divides the checks, fires
up 2 processes, each process initialises and takes 2 seconds to load the
custom_algorithm, each first check then takes 4 seconds and the subsequent
check benefit from the cached functions and take 0.3 seconds to run.  In
which time an additional 30 checks were submitted to Mirage so as soon as it is
complete, it fires up 2 new processes and does the same again.

flux/tornado allows you to specify and added custom_algorithms or the
expensive part of a custom_algorithm to be loaded and served by flux, so that
every call is fast and incurs no load time penalities.

Example custom algorithms
~~~~~~~~~~~~~~~~~~~~~~~~~

There are two example custom algorithms in the repo for you to model the
structure of your custom algorithm on.

**abs_stddev_from_median**

This is the simplest custom algorithm structure, it does not have any
``algorithm_parameters`` and has no debug logging.

https://github.com/earthgecko/skyline/tree/master/skyline/custom_algorithms/abs_stddev_from_median.py

**last_same_hours**

This is an example of a more complex custom algorithm structure, that uses
``algorithm_parameters`` and can even debug log to the Skyline app log if
``debug_logging`` is passed and enabled via the ``algorithm_parameters``.

https://github.com/earthgecko/skyline/tree/master/skyline/custom_algorithms/last_same_hours.py

Running a Mirage only custom algorithm on a metric all the time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Normally for Analyzer to push a metric to Mirage, Analyzer would have to trigger
on it as anomalous.  However if you wish to run a custom algorithm on a metric
that requires ``SECOND_ORDER_RESOLUTION_HOURS`` of data to run against as the
:mod:`settings.FULL_DURATION` data is not sufficient for the custom algorithm,
perhaps due to seasonality, then you need to declare the metric in
:mod:`settings.MIRAGE_ALWAYS_METRICS`.  This will cause Analyzer to add the
metric to Mirage on every run.  Note that the metric needs to be defined as a
mirage enabled metric in the normal way, ensuring it matches a smtp alert
defined in :mod:`settings.ALERTS` with a ``SECOND_ORDER_RESOLUTION_HOURS``
declared.

Some things to consider
~~~~~~~~~~~~~~~~~~~~~~~

- Think about what Skyline apps you want your algorithm to run in.  If you are
  wanting to use data > :mod:`settings.FULL_DURATION` then ensure you only
  specify ``'use_with': ['mirage', 'crucible'],``.
- Thoroughly test your algorithm with ``debug_logging``
- Purposefully break your algorithm during testing to test and see how the error
  handling is working.
- Any custom algorithms applied to analyzer must be **FAST**.  Custom algorithms
  that are only applied to mirage and analyzer_batch can take a bit longer to
  run, but they will delay analysis the longer their execution time.