Recently Zalando Database Engineer Jan Mußler stopped by our Fashion Insights Centre in Dublin to tell the DevOps Dublin crowd about ZMON: Zalando's open source monitoring solution. In late 2013, Zalando’s technology team maxed out on our Icinga/Nagios infrastructure both performance-wise but (especially) in terms of manageability. Taking advantage of our annual Hackweek—a weeklong event for Zalando technologists to build, play and experiment independently—Jan and other Zalando engineers built ZMON to provide our teams with a performant, reliable, and flexible tool for monitoring all levels of our platform.
ZMON is equipped to monitor low-level system metrics via SNMP/NRPE and HTTP requests to exposed metrics, as well as higher-level KPIs via SQL and much more using Python expressions as tasks. Its base components include a scheduler, Redis for state and queue, and a distributed set of workers responsible for evaluating checks and alerts. On top of these are a frontend component for Dashboards and Alerting (it includes Grafana to make the most of your time series data) and KairosDB as a metric store. Trying out ZMON is as easy as spinning up a Vagrant box!
Check out Jan's slidedeck here: