@JeremyCherfas Do you use the burgers as buns for that megaslice of onion? (Because that actually sounds pretty good, come to think…)
@jussipekonen Too late now, but rash guards have made pool-going a lot easier and less sunburny since I learned of them. Think UV-rated quick-dry shirt. Way less annoying than sunscreen and never wears off.
@33mhz Aww, have fun! (And careful to avoid heat stroke with that St. Louis summer!)
@tewha Can you make a small change now instead of a big, permanent one later/never? And then another next week?
Turnbull, James. Art of Monitoring. June 2016. Ebook via O’Reilly Queue.
Selective reading to focus on overall approach and application monitoring, since those are the parts relevant to me.
Approach:
- Proposes an approach that copes well with dynamic hosts at large scale (one-way flow of data from host to collector, rather than Nagios scans).
- Replaces Booleans with numbers - down becomes “no new measure since 5 seconds ago”, up becomes “here is some actually useful data”. Static thresholds can be replaced by anomaly detection.
Used a bunch of tooling (Riemann, collects, statsd, graphite/whisper/grafana, ELK), but author has a newer book out on Prometheus, since that has emerged as especially popular lately.
App metrics: Tech events, performance, etc. Help guide devs or ops.
Business metrics: Often the same events but different measures - count order prices vs just number of orders. Help guide business decisions and support that IT creates revenue.
New to me is the idea of recording metrics (think StatsD counters or timers) that mirror your structured logging. I’m not sure what this buys you if you already have structured logs, except that perhaps the logs may be pruned due to log level or other config, while the metrics will not. (Compare analytics vs logging - people will religiously mute ALL their logs in Release “because logs kill kittens”, while letting Firebase or Fabric do scads of work, including regularly pinging on a timer. Mumble mumble letter over spirit of guidance mumble mumble.)
Another neat idea is to send non-PROD logs/metrics to a central host vs a sidecar daemon - less trouble for DEV.
Easy to miss events:
- Deployments: You want to see these alongside other metrics!
- Maintenance: You want to shut up alerts when you intentionally bring the system down.
It’s unfortunate there’s no standard for structured logging, but I can’t say I’m surprised - it’s too “thin” concept to support a standard. No-one feels they need it.
@kyle Thanks for that Spotify playlist. It’s proven a good seed for finding more good music. :)