Is MTTR still relevant in a modern, cloud native world?
MTTR has long been an essential failure metric. However, in a cloud native world, P95 and P99 have become more meaningful measurements. And time to remediation -not repair- is most important. During the talk, Martin will share an alternative to MTTR and how it can become your new P99 of remediation.
Mean time to repair (MTTR) has long been an essential failure metric measuring the average time it takes to repair or restore a system to functionality. But why, in the age of microservices and containers, are we still using a metric with its origins in measuring equipment failures within factories? Mean, or average, is no longer a relevant metric for most organizations, with P95 and P99 becoming the more meaningful measurement. Repair, or sometimes restore, is also problematic. In most cases the most important time period to measure is the time to remediation, or the time to alleviate customer pain, restoring the service to acceptable levels of availability and performance. In this session, Martin will introduce an alternative to MTTR, and share real-life examples and lessons learned to explain how this new way of thinking can become your new P99 of remediation time.
(p)reactions
More about Martin Mao
Martin Mao is the co-founder and CEO of Chronosphere. He was previously at Uber, where he led the development and SRE teams that created and operated M3. Prior to that, he was a technical lead on the EC2 team at AWS and has also worked for Microsoft and Google. He and his family are based in our Seattle hub and he enjoys playing soccer and eating meat pies in his spare time.