Being a Devops/Platform Engineer eight months in

So I have been a devops/platform engineer now for about eight months, it’s been an exciting journey so far. Actually I didn’t expect to be a full time devops, what I had in mind was being a backend engineer but with deeper knowledge and skill in infra & monitoring - also getting a deeper understanding of Kubernetes. Maybe that’s an SRE role - I don’t know, the devops/SRE line is still a bit blurry to me.

So how I got here? Last year our 1 on 1, I told my manager that I was feeling stuck in terms of personal development. I told him that I have been preparing for Google Cloud Certification for a year but made little progress because of how busy and chaotic last year was (COVID-19), but also I didn’t actually use GCP much in my work, eventhough our workloads are in GCP.

My manager then told me about an opening in the Cloud/Infrastructure team, I actually saw that too - but I wasn’t sure that I was qualified. He mentioned me to the head of Cloud/Infra team and we had a chat and he encouraged me to apply. So I applied, went through the proper interview process and thank God, I got the job. I will be forever grateful for my previous manager for finding that opportunity in the organisation where I can grow - at his cost, I may say, as he would need to backfill my position.

So eight months in, how do I feel about making the jump? I can truly say that it was a good decision - it’s a no-brainer and a nothing to lose decision for me to made. I didn’t have to give up any compensation on making the move and if it for any reason doesn’t work out, I could always go back to my old team.

Has it been easy? No, absolutely it has not been. After years of being comfortable in being a backend/fullstack engineer role, being in a situation where I don’t have a mastery over my work frustrated me a lot. But being a “beginner” also where I find the most joy, I have so many areas to explore and so much room to grow.

Here are some of my observations on the role so far.

Networking is hard

Networking, firewalls in particular, has been the most challenging area for me so far. And it is an area that I am really keen to get better at.

Usually my issue is in the form of why is this VM can’t access this endpoint/resource and vice versa and things to check then are:

  • Are the ingress/egress firewall rules correct?
  • What’s the result of network connectivity check?
  • Is on-prem firewall involved?
  • Is Squid NAT involved? (see example below)

I recall one occassion where I need to a create a VM with public IP to connect a third party tool to our CloudSQL database. We were able to connect to the VM using its private IP however public IP was no bueno.

We had a long back and forth in a Google support ticket and a few Google hangout calls. The hangout calls were helpful as I get to see what tools Google engineers use to diagnose. From the sessions, we worked out that the VM did receive traffic (ingress) but traffic out (egress) is being blocked for some reason. Case escalated somewhere in the Googlesphere and then Google network experts found out what the issue was, so there is a squid proxy sitting between GCP and outside world and that blocked the traffic out of the VM. I forgot about this proxy/NAT server so it’s kind of my bad - sorry for wasting so many people’s time. Once we bypassed this proxy (by applying a specific network tag to the VM) - all is well.

On another occassion, I was requested to create a port forward proxy on GCP to connect a third party tool to our on-prem server. This is another time consuming thing to setup as it involves configuring firewall in GCP and on-prem firewall.

After awhile though, the networking starting to make sense. I am still not happy with my knowledge and network troubleshooting skills, this is an area that I would like to grow more.

Permission is hard

I was initially part of Cloud & Infra team when I moved across - this means I support all teams' cloud. What’s good about this is: I have the key to the kingdom, I have access to all the things.

Few months in, there was a bit of restructuring and a new role is introduced - which is the embedded devops role. I was chosen to be the embedded devops person for Data & Analytics team - which then means my permissions are now scoped to the that team only. This is good and proper, following the principle of least privilege.

However this has meant a lot of

However getting the right level of permissions is hard - especially with Data & Analytics team which owns a lot of GCP projects and also share projects with other teams.

Slow feedback cycle

As a backend engineer, I am used to getting a quick feedback on my tasks - be it through manual testing or programmatic testing. This is not always the case with infra, an example: yesterday I was tasked to configure an HTTPS url for a workload.

Rather than managing certs manually on the VM - we thought of leveraging GCP load balancer for this. So I created managed instance groups (MIG), then created load balancer (LB) and provision a certificate through the LB. However the HTTPS doesn’t work, while the HTTP connection work.

With the help of Google Support (again), found out the issue was due to LB not able to provision the certificate. To provision the certificate the domain must first point to the LB IP address (A or AAAA records on DNS) – which I forgot to set up.

Once that’s setup, I had to wait for the DNS record to propagate. While I was waiting, I don’t have 100% confidence that this will work - this is what I said about slow feedback cycle.

Get used to raising support tickets

I was first quite reluctant to ask help from our vendor and chose to spend a lot of time figuring things myself. But now I am better in recoqnising issue that I can work on myself but just needing time vs issue that needs to be raised with the vendor.

Google Support has been the most helpful vendor support in my experience. Having access to them is a godsend.

Infra as code is slow

Coming from a software engineer background, I totally get the reasoning behind the Infra as Code (IaC), but it does take time to do. I am talking about using Terraform here.

This especially true when you are provisioning a new set of resources, resources that we haven’t established a pattern for. For example the scenario when I had to create HTTPS endpoint with GCP LB above. What I did was: provision the resources using GCP console and when it worked - I then imported the resources to Terraform.

The number of resources to be terraformed is not insignificant: managed instance group, health check, load balancer, certificate etc.

Growing appreciation of Kubernetes

Getting deeper understanding of Kubernetes is one of the highlights - I managed to get CKAD certification too.

Horizontal growth vs vertical growth

One of the reasons why I felt “stuck” as software engineer is because I feel there is a little area to go grow “deeper”. You could say that the rate of my horizontal growth was diminishing. Maybe I just have been doing a repetitive engineering work eventhough I’ve changed jobs & teams, e.g creating APIs, connecting one system to another, moving data from one place to another etc. Sure I could learn different programming paradigm e.g functional, learn different languages and framework - but honestly to me they are just a slightly different ways of doing the same thing.

Being a devops provides me an avenue to achieve growth horizontally. I am hoping to be a more rounded engineer in the end.

Why not every engineers want to learn infra?

This is a reI am still a bit dissapointed to find software engineers not really interested in learning infrastructure and hence perpetuating “the not my problem” culture.

Another reason why I want to learn infra (and kubernetes in particular) is so that I can:

  • Teach the software engineers in the team about it
  • Become a bridge between software engineers and the operation engineers - but ideally bringing down the silos so the bridge is not needed

Conclusion

Overall, these past eight months has been a growth journey for me - the role does take me out of my comfort zone and that’s a good thing!

I do miss building with Ruby though, hopefully one day I will be in the intersection of Ruby, Kubernetes and GCP.