I came across Niall Murphy (one of the OGs of the SRE books) post on What SRE is not few weeks ago and it honestly has been a game changer for me (no exaggeration here 🙂), I loved it so much that this is the 3rd time I referred it (blogged it internally and shared it on LinkedIn).
You could stop reading this post and read Niall’s post instead - I’d probably worth your time more.
- But if you are still here
- shameless plug, I wrote What is SRE last month, in which I tried to clarify the differences between DevOps and SRE.
In his post, Niall described what SRE is not (or what can you remove from SRE and have it still be SRE?). Looking at defining SRE this way actually helps me understand the practice of SRE better.
The TL;DR is SRE is an engineering role.
Because it’s an engineering role then it follows that:
- If the SREs don’t have access to the source code nor can change the code or the system design, then they aren’t doing SRE. The point here is SREs are to be trusted to provide deep engineering contribution to the systems they are supporting.
- If the SREs don’t write code, then you aren’t doing SRE. Niall calls this team an operations team with distributed system expertise, which is a valuable practice in itself – but like the point above, the team won’t be able to provide deep engineering contribution in terms of scaling, reliability, monitoring etc.
His point of SRE is best implemented in a “large” scale [system|business|data|users] clears up my own confusion on why not many companies implemented SRE (given the hype surrounds it).
I love this his point about SRE job title:
SRE work does not require the SRE job title to perform. Conversely, having the SRE job title but not doing SRE work creates confusion and dismay
About Niall’s second statement above, I have experienced this myself from talking to people in the industry and looking at SRE job advertisements - so I really appreciate article like this that clearly articulates what SRE is (by defining what it is not).