Bad Observability
What are some antipatterns can hurt the success of our observability? What does “bad” observability look like?
From my experience, most organizations have a lot of monitoring. And most of the time that monitoring isn’t able to answer basic questions about customer behaviour and experience, and isn’t being used as part of a feedback loop to pivot and make better business decisions.
In this session I will explore both technical and cultural antipatterns that hinder observability. For example: - Lots of data but no insights - Monitoring a lot of technical metrics but not tracking customer behaviour and the impact of changes - Not knowing what the desired level of service is for the customer, or tracking it, or responding to it as part of a feedback loop - Misunderstanding what aggregates (averages, percentiles, etc.) mean and do not mean, or the impact of sampling intervals - Misunderstanding what certain metrics mean (for example, available memory on CPU’s, % CPU usage on containers and VM’s) - Siloed teams who do not share their monitoring with others …and many more.
In this session my goal is to bring us back to outcomes. Rather than “do monitoring” for the sake of it, let’s be thoughtful about what we choose to measure and how we deal with the data that comes back, so that it drives better customer and business outcomes.
More about Stephen Townshend
Stephen pretended to be a performance engineer for thirteen years, and very recently started pretending to be an SRE. He is actually an actor, playing the role of a site reliability engineer. At some point along the way he lost touch with reality and is no longer sure if he is acting, or this has become reality. In the words of Robert Downey, Jr. in the film Tropic Thunder: “I know who I am. I’m a dude playing a dude disguised as another dude.”