So You Want to Write an SLO
I used to work in Ruby on Rails. It’s not bad to work in if you’re looking to crank some code out quickly, are okay with a few runtime errors, love throwing money at compute resources, and the overall level of service doesn’t need to be that high. It was an okay time. But how did I know the level of service wasn’t as good as it could’ve been? Am I just biased because I love Rust? While I am biased, I know the level of service was lacking because I set up service-level objectives (SLOs) for both my Ruby and Rust services. Across the board, my Rust services had stricter SLOs and consistently met them. Frequent SLO error budget violations prompted me to move away from Ruby in the first place. I wasn’t content to set an SLO based on the tech I was using—rather, I wanted to set an SLO based on the business context and then use the tech I needed to reach that SLO. However, this article isn’t about language choice. It’s about SLOs, so let me acknowledge that writing good SLOs is tricky. To construct a good one, we need to understand why we need SLOs and what they do, and then we’ll dive into how to set them for your service.One unfortunate reality of software services is that things go wrong.Sometimes, things go wrong simply because of a defect in the code. But other times, you end up dealing with issues that don’t have an obvious cause or a simple fix. You might experience a transient network outage. A server might run out of CPU or memory unexpectedly due to a surge in traffic. Multi-threaded server code could have a subtle flaw that allows for a race condition.
0 Comments