Riemann is a great tool to do realtime monitoring. Just send it every event in your infrastructure (metrics, textual events, metadata, anything really) and define alerts or use riemann-dash to query the current state. Its learning curve is quite steep (in part due to the fact you need to learn bits of clojure) but its benefits are absolutely worth it. Moreover, support is professional and very responsive (be sure to poke aphyr or pyr on IRC).

