====================== Analyzer Optimizations ====================== The original implementation of Skyline has worked for years, almost flawlessly it must be said. However the original implementation of Skyline was patterned and somewhat even hard coded to run in a very large and powerful setup in terms of the volume of metrics it was handling and the server specs it was running on. This setup worked and ran OK on all metric volumes. However if Skyline was setup in a smaller environment with a few 1000 metrics, the CPU graphs and load\_avg stats of your Skyline server suggested that Skyline did need a LOT of processing power. However this is no longer true. This was due to one single line of code at the end of Analyzer module, which only slept if the runtime of an analysis was less than 5 seconds, undoubtedly resulted in a lot of Skyline implementations seeing constantly high CPU usage and load\_avg when running Skyline. As this resulted in Analyzer running in say 19 seconds and then immediately spawning again and again, etc. https://github.com/etsy/skyline/blob/master/src/analyzer/analyzer.py#L242 .. code-block:: python # Sleep if it went too fast if time() - now < 5: logger.info('sleeping due to low run time...') sleep(10) Number of Analyzer processors ----------------------------- A number of optimizations have changed the required number of processors to assign to :mod:`settings.ANALYZER_PROCESSES`, quite dramatically in some cases. Specifically in cases where the number of metrics being analyzed is not in the 10s of 1000s. Python multiprocessing is not very efficient if it is not need, in fact the overall overhead of the spawned processes ends up greater than the overhead of processing with a single process. For example, if we have a few 1000 metrics and we have 4 processors assigned to Analyzer and the process duration is 19 seconds, in the original Analyzer we would have seen 4 CPUs running at 100% constantly (due to the above Sleep if it went to fast). Even if we leave to Sleep if went too fast in and we change to :mod:`settings.ANALYZER_PROCESSES` to 1, we will find that: - a) we now see 1 CPU running at 100% - b) our duration has probably increased to about 27 seconds - c) we use a little more memory on the single process When we optimize the sleep to match the environment with the :mod:`settings.ANALYZER_OPTIMUM_RUN_DURATION` of in this case say 60 instead of 5, we will find that: - a) we now see 1 CPU running at a few percent, only spiking up for 27 seconds When we further optimize and use the :mod:`settings.RUN_OPTIMIZED_WORKFLOW` we will find that: - a) we now see 1 CPU running at a few percent, only spiking up for 14 seconds - b) our duration has probably decreased to about 50% See `Optimizations results`_ at the end of this page. Analyzer work rate ------------------ The original Analyzer analyzed all time series against all algorithms which is the maximum possible work. In terms of the :mod:`settings.CONSENSUS` based model, this is not the most efficient work rate. Performance tuning ------------------ A special thanks to `Snakeviz `__ for a very useful Python profiling tool which enabled some minimal changes in the code and substantial improvements in the performance, along with `cprofilev `__ and `vmprof `__. Using anomaly\_breakdown metrics graphs to tune the Analyzer workflow --------------------------------------------------------------------- anomaly\_breakdown metrics were added to Skyline on 10 Jun 2014, yet never merged into the main Etsy fork. However, in terms of performance tuning and profiling Skyline they are quite useful. They provide us with the ability to optimize the analysis of time series data based on 2 simple criteria: 1. Determine the algorithms that are triggered most frequently 2. Determine the computational expense of each algorithm (a development addition) that adds ``algorithm_breakdown.*.timing.times_run`` and the ``algorithm_breakdown.*.timing.total_time`` metrics to skyline.analyzer. ``hostname`` graphite metric namespaces. We can use these data to determine the *efficiency* of the algorithms and when this is applied to the Analyzer :mod:`settings.CONSENSUS` model we can optimize ``algorithms.py`` to run in the most efficient manner possible. Originally algorithms.py simply analyzed every time series against every ``algorithm in ALGORITHMS`` and only checked the :mod:`settings.CONSENSUS` threshold at the end. However a very small but effective optimization is to use the above data to run the following optimizations. - The most frequently and least expensive :mod:`settings.CONSENSUS` number of algorithms and then determine if :mod:`settings.CONSENSUS` can be achieved. Currently there are 9 algorithms that Analyzer uses. However the same optimization is valid if more algorithms were added. - If our :mod:`settings.CONSENSUS` was 6 and Analyzer has not been able to trigger any of the most frequently and least expensive 5 algorithms, then there is no need to analyze the time series against the remaining algorithms. This surprisingly reduces the work of Analyzer by ~xx% on average (a lot). - The cost of this optimization is that we lose the original ``algorithm_breakdown.*`` metrics which this was evaluated and patterned against. However two additional factors somewhat mitigate this but it is definitely still skewed. The mitigations being that: - When a time series is anomalous more than one algorithm triggers anyway. - When an algorithm is triggered, more algorithms are run. Seeing as we have optimized to have the least frequently triggered algorithms be run later in the workflow, it stands to reason that a lot of the time, they would not have triggered even if they were run. However it is still skewed. These optimizations are now the default in ``settings.py``, however they have been implemented with backwards compatibility and for the purpose of running Analyzer without optimization of the algorithms to ensure that they can be benchmarked again should any further algorithms ever be added to Analyzer or any existing algorithms modified in any way. algorithm benchmarks -------------------- analyzer\_dev can be/was used as a benchmarking module to determine the execution times of algorithms. Considerations - approximation of timings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The algorithm benchmark timings are simply approximations of the real times that the algorithm execution is undertaken in (float). tmpfs vs multiprocessing Value ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Recording the algorithm counts and timings without using multiprocessing Value and associated overhead of locks, etc, etc. /tmp was opted for instead and the variable :mod:`settings.SKYLINE_TMP_DIR` was added. In most cases /tmp is tmpfs which is memory anyway so all the heavy lifting in terms of locking etc is offloaded to the OS and modules do not have to incur the additional complexity in Python. A simple yet effective win. Same same but different. There may be some valid reasons for the use multiprocessing Value or Manager().list() Algorithms ranked by triggered count ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Using the ``anomaly_breakdown`` metrics data it shows that on a plethora of machine and application related metrics, we can determine the most triggered algorithms by rank: 1. stddev\_from\_average 2. mean\_subtraction\_cumulation 3. first\_hour\_average 4. histogram\_bins 5. least\_squares 6. grubbs 7. stddev\_from\_moving\_average 8. median\_absolute\_deviation 9. ks\_test Algorithms ranked by execution time ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Using the ``algorithm_breakdown`` metrics data we can determine the most "expensive" algorithms by total time to run: 1. least\_squares (avg: 0.563052576667) 2. stddev\_from\_moving\_average (avg: 0.48511087) 3. mean\_subtraction\_cumulation (avg: 0.453279348333) 4. median\_absolute\_deviation (avg: 0.25222528) 5. stddev\_from\_average (avg: 0.173473198333) 6. first\_hour\_average (avg: 0.151071298333) 7. grubbs (avg: 0.147807641667) 8. histogram\_bins (avg: 0.101075738333) 9. ks\_test (avg: 0.0979568116667) Performance weighting ~~~~~~~~~~~~~~~~~~~~~ If we change the order in which the time series are run through the algorithms in Analyzer, we can improve the overall performance by running the most expensive computational algorithms later in the analysis. :: +-----------------------------+----------------+---------------------+ | Algorithm | Triggered rank | Execution time rank | +=============================+================+=====================+ | histogram_bins | 4 | 8 | +-----------------------------+----------------+---------------------+ | first_hour_average | 3 | 6 | +-----------------------------+----------------+---------------------+ | stddev_from_average | 1 | 5 | +-----------------------------+----------------+---------------------+ | grubbs | 6 | 7 | +-----------------------------+----------------+---------------------+ | ks_test | 9 | 9 | +-----------------------------+----------------+---------------------+ | mean_subtraction_cumulation | 3 | 2 | +-----------------------------+----------------+---------------------+ | median_absolute_deviation | 8 | 4 | +-----------------------------+----------------+---------------------+ | stddev_from_moving_average | 7 | 2 | +-----------------------------+----------------+---------------------+ | least_squares | 5 | 1 | +-----------------------------+----------------+---------------------+ :mod:`settings.RUN_OPTIMIZED_WORKFLOW` -------------------------------------- The original version of Analyzer ran all time series through all ``ALGORITHMS`` like so: .. code-block:: python ensemble = [globals()[algorithm](timeseries) for algorithm in ALGORITHMS] After running all the algorithms, it then determined whether the last datapoint for time series was anomalous. The optimized workflow uses the above triggered / execution time ranking matrix to run as efficiently as possible and achieve the same results (see caveat below) but up to ~50% quicker and less CPU cycles. This is done by iterating through the algorithms in order based on their respective matrix rankings and evaluating the whether :mod:`settings.CONSENSUS` can be achieved or not. The least\_squares algorithm, which is the most computationally expensive, now only runs if :mod:`settings.CONSENSUS` can be achieved. The caveat to this is that this skews the ``anomaly_breakdown`` metrics. However seeing as the ``anomaly_breakdown`` metrics were not part of the original Analyzer this is a mute point. That said the performance tuning and optimizations were made possible by these data, therefore it remains possible to implement the original configuration and also time all algorithms (see Development modes if you are interested). A word of warning, if you have setup a Skyline implementation after the :mod:`settings.RUN_OPTIMIZED_WORKFLOW` and you have > 1000 metrics running the unoptimized workflow with the original 5 seconds may send the load\_avg through the roof. The original Analyzer :mod:`settings.ALGORITHMS` setting was: .. code-block:: python ALGORITHMS = [ 'first_hour_average', 'mean_subtraction_cumulation', 'stddev_from_average', 'stddev_from_moving_average', 'least_squares', 'grubbs', 'histogram_bins', 'median_absolute_deviation', 'ks_test', ] The new optimized Analyzer :mod:`settings.ALGORITHMS` setting based on the above performance weighing matrix is: .. code-block:: python ALGORITHMS = [ 'histogram_bins', 'first_hour_average', 'stddev_from_average', 'grubbs', 'ks_test', 'mean_subtraction_cumulation', 'median_absolute_deviation', 'stddev_from_moving_average', 'least_squares', ] Optimizations results --------------------- These server graphs show the pre and post crucible update metrics related to CPU and loadavg for a dedicated Skyline server running on a 4 vCPU, 4GB RAM, SSD cloud server. The server is handling ~3000 metrics and is solely dedicated to running Skyline. It was upgraded from the Boundary branch version to Crucible. It was running Analyzer, Mirage and Boundary, however these graphs clearly show the impact that the Analyzer optimizations have on the overall workload. Interestingly, after deployment the server is also running MySQL and the Panorama daemon in addition to what it was running before. The server was running the skyline.analyzer.metrics branch Analyzer which was only a few steps away from Etsy master and those steps were related to Analyzer sending a few more Skyline metric namespaces to Graphite, in terms of Analyzer and the algorithms logic pre update were identical to Etsy master. .. figure:: images/crucible/analyzer.optimizations/analyzer.skyline1.runtime.png .. figure:: images/crucible/analyzer.optimizations/analyzer.optimizations.cpu0.png .. figure:: images/crucible/analyzer.optimizations/analyzer.optimizations.cpu1.png .. figure:: images/crucible/analyzer.optimizations/analyzer.optimizations.cpu2.png .. figure:: images/crucible/analyzer.optimizations/analyzer.optimizations.cpu3.png .. figure:: images/crucible/analyzer.optimizations/analyzer.optimizations.loadavg.png