Getting data into Skyline

You currently have two options to get data into Skyline, via the Horizon service:

A note on time snyc

Although it may seems obvious, it is important to note that any metrics coming into Graphite and Skyline should come from synchronised sources. If there is more than 60 seconds (or highest resolution metric), certain things in Skyline will start to become less predictable, in terms of the functioning of certain algorithms which expect very recent datapoints. Time drift does decrease the accuracy and effectiveness of some algorithms. In terms of machine related metrics, normal production grade time snychronisation will suffice.

TCP pickles

Horizon was designed to support a stream of pickles from the Graphite carbon-relay service, over port 2024 by default. Carbon relay is a feature of Graphite that immediately forwards all incoming metrics to another Graphite instance, for redundancy. In order to access this stream, you simply need to point the carbon relay service to the box where Horizon is running. In this way, Carbon-relay just thinks it’s relaying to another Graphite instance. In reality, it’s relaying to Skyline.

Here are example Carbon configuration snippets:

relay-rules.conf:

[all]
pattern = .*
destinations = 127.0.0.1:2014, <YOUR_SKYLINE_HOST>:2024

[default]
default = true
destinations = 127.0.0.1:2014:a, <YOUR_SKYLINE_HOST>:2024:a

carbon.conf:

[relay]
RELAY_METHOD = rules
DESTINATIONS = 127.0.0.1:2014, <YOUR_SKYLINE_HOST>:2024
USE_FLOW_CONTROL = False
MAX_QUEUE_SIZE = 5000

A quick note about the carbon agents: Carbon-relay is meant to be the primary metrics listener. The 127.0.0.1 destinations in the settings tell it to relay all metrics locally, to a carbon-cache instance that is presumably running. If you are currently running carbon-cache as your primary listener, you will need to switch it so carbon-relay is primary listener.

Note the small MAX_QUEUE_SIZE - in older versions of Graphite, issues can arise when a relayed host goes down. The queue will fill up, and then when the relayed host starts listening again, Carbon will attempt to flush the entire queue. This can block the event loop and crash Carbon. A small queue size prevents this behavior.

See the docs for a primer on Carbon relay.

Of course, you don’t need Graphite to use this listener - as long as you pack and pickle your data correctly (you’ll need to look at the source code for the exact protocol), you’ll be able to stream to this listener.

UDP messagepack

Horizon also accepts metrics in the form of messagepack encoded strings over UDP, on port 2025. The format is [<metric name>, [<timestamp>, <value>]]. Simply encode your metrics as messagepack and send them on their way.

However a quick note, on the transport any metrics data over UDP.... sorry if did you not get that.

Adding a Listener

If neither of these listeners are acceptable, it’s easy enough to extend them. Add a method in listen.py and add a line in the horizon-agent that points to your new listener.

settings.FULL_DURATION

Once you get real data flowing through your system, the Analyzer will be able start analyzing for anomalies.

Note

Do not expect to see anomalies or anything in the Webapp immediately after starting the Skyline services. Realistically settings.FULL_DURATION should have been passed, before you begin to assess any triggered anomalies, after all settings.FULL_DURATION is the baseline. Although not all algorithms utilize all the settings.FULL_DURATION data points, some do and some use only 1 hour’s worth. However the Analyzer log should still report values in the exception stats, reporting how many metrics were boring, too short, etc as soon as it is getting data for metrics that Horizon is populating into Redis.