I'm here to talk about apprenticeship as a path to growing your SRE team today. Apprenticeship is ultimately how I was able to enter Site Reliability Engineering, and I think it’s the solution to the “hiring crisis” we often feel trying to fill empty headcount. Hiring SREs can be challenging, but if your team can provide the right mix of support and opportunity for an apprentice, getting the SREs you need becomes a lot easier.
So, hi, I'm Rowan. I'm an SRE at BuzzFeed, and like a lot of folks here, I had what you could call a "non-traditional" start to my career. In fact, while I don't recommend this as a strategy, I started by telling a Vice President of Engineering he probably shouldn't hire me because I was a brand new engineer and I didn't know anything. I think that's about as non-traditional as it gets. Fortunately he didn’t listen to me.
Through this talk, I’m going to share with you what my apprenticeship with our SRE team looked like, and how to extract the principles that guided it to a framework that will help you find your own apprentices and grow them into senior engineers.
To start with, “what is apprenticeship?” is a question I had to answer to write this talk. These days it can have some interesting connotations, from Mickey Mouse as the sorcerer’s apprentice in Fantasia to vocational training or even Sushi Chefs. Apprenticeship has traditionally had a formal quality to its meaning, and that's why I chose to use the word. I want to emphasize that in order for apprenticeship to be successful it must be formalized and structured. Apprenticeship is an educational experience that occurs through practice of skills, guided progress, and mentorship. The apprentice benefits from the concrete connection of theoretical learning and practical application. In addition, because apprenticeship is paid employment, an apprentice is able to focus fully on developing skills and knowledge, without the struggle of having to figure out how to support themself.
I want to bring up a misperception I’ve seen in the past, apprenticeship is not an experience remote positions can offer. I'd like to challenge this at the start. What is important to an apprenticeship is the formal structure and support, not the channels through which those things are offered. While an apprentice at a mostly remote company will have additional challenges in ensuring they're getting enough contact with their mentors and team, it's not impossible. One of my primary mentors during my apprenticeship was a remote employee who traveled the globe, often posting “no filter” instagrams of the places he was and the experiences he was having...which inspired me to work a little harder so I could have that jetsetting life too.
At this conference, I, after some encouragement, started hustling for a job by telling people my story and that I was looking. And that was when I met BuzzFeed, which brings me to my first point about apprentices: listen and look and the right candidate will appear.
So, who is the right candidate? How do you find them?
When I interviewed at BuzzFeed, I had a different kind of technical interview. Rather than do a take-home assignment or a HackerRank style online assessment, I was invited to share some code I had written earlier in my experience, and during my coding assessment we discussed why design choices had been made given my knowledge at the time, and what I would change now that I had grown some as an engineer. Not only did this provide a strong signal about how quick and able I was to learn, it allowed my interviewers to see what my strengths were, and where I might have weaknesses.
During my interviews I also participated in a design discussion where I conceptually described my approach to building a link shortener like bit.ly, starting with the problem of shortening links, and scaling out in layers to massive global traffic. This exercise demonstrated what areas of systems design I already grasped, and where I was in relation to a degree-holding candidate. Because it was held in a more high-level, abstract fashion, we were able to touch on many aspects of computer science without triggering panic about not knowing traditional whiteboard questions.
Finding the right candidate is about structuring your process to help bring out the qualities in potential apprentices that make them great for the job.
The trick is that strong technical skills don't always look like traditional "computer" skills. For example, knitting is a complex fiber art that utilizes algorithmic instructions. Cooking requires precision and systematic execution. Music and math are sibling disciplines. Candidates with hobbies, passions, and skills like these are demonstrating the same skills that will make them good engineers.
A lot of companies have a leadership value that reads something like, 'raises the bar,' and I’m going to take a moment to talk about how candidates like these raise the bar in their teams and organizations.
There are all the standard things I could say, about hard work and strong growth mindsets. But even more than that, I think apprentices have a vital new perspective. Apprentices don't know that you "can't" do something with your code, which leads to innovative approaches to old, often stubborn problems. An apprentice working in your codebase may find something that makes their brain itch, and in working it out solve a bug that's plagued you since your first release. Or they may find solutions you didn't know were possible, while researching the problem space. Sometimes a new perspective is also necessary for a product to mature and grow.
Apprentices also raise the technical skill level for the entire team, both through their own contributions, and through the process of integrating them as members. Code reviews are an opportunity for every member of the team to think about fundamentals and the principles that guide their development. Meanwhile, leading a shadow on-call is a chance to ensure runbooks and documentation are clear, alerts meaningful, and observability appropriately set up and useful.
On the other side, the right team for an apprentice has engineers who are stable in their careers and ready to mentor. The right team will need to provide enough folks with a diversity of perspectives and skills to help the apprentice develop as they begin their journey of professional software engineering. Teams with apprentices require an additional amount of empathy, communication, and willingness to teach...and learn.
Once I was hired at BuzzFeed, my apprenticeship began in earnest. As I mentioned previously, apprenticeship is a structured learning experience, and in figuring out how to help me level up and become a full member of the team, my manager and mentors developed a framework with five key areas: Sandbox, Shadow, Exposure, Inclusion, and Ritual. Each of these tied to a concept that was an important part of my growth as an engineer.
The base layer of apprenticeship in this case was 'sandbox.'
My first major project once I on-boarded at BuzzFeed was helping to build a new type environment for us to use. It involved learning a number of new technical skills (such as Terraform, and how IAM roles work), and I worked on the project with an experienced engineer. Since it was isolated from our production and staging environments, I was free to learn through experience and experimentation rather than risking breaking production.
For SREs, however, the day to day problems can include load balancing globally scaled traffic, the orchestration of hundreds of containers, or managing data storage and observability in support of a service with a "four or five nines" uptime SLA. These are not easy problems to simulate at an individual scale, even if writing and deploying real services via a personal tier on a cloud provider. In order to learn these skills, apprentice SREs who come from a software engineering background especially benefit from having environments that replicate production in which to gain experience and confidence with skills considered more traditionally part of a "sysops" role.
While learning confidence and skills through the ownership of a sandbox, apprentices also need active mentorship to teach skills that aren’t easily learned through self-directed exploration. I refer to this pillar of the framework as “shadow.”
In addition to pair programming with and learning infrastructure management from more senior engineers, I shadowed one of my 'official' mentors to learn how to handle on-call shifts. Coming from a web development bootcamp and a single internship, on-call was a concept I had heard of but not experienced. To learn how to do on-call my mentor and I shared a "pager." For the first few shifts, when my mentor was paged, I also logged onto Slack, discussed with him what was happening and learned how to address those middle of the night things that are often only "encoded" in team lore.
Once I was comfortable, when the pager went off, we would both log in but I would investigate and communicate to my mentor, who would only step in if I was over my head or asked him to. I barely noticed as over time his input and nudging basically faded away to nothing, the final step of shadowing, driving on my own.
The challenge for many teams here is finding a mentor match for the apprentice. I think one of the things BuzzFeed did differently that really helped to close that gap is arranging for a variety of mentors in different areas. While one mentor might be more my team and culture mentor, another might be an operations skills mentor. Strong apprenticeships benefit from this team based approach because they allow the apprentice to experience a greater number of opinions and methodologies and adopt the tools that will work best for them.
I note: it can be uncomfortable for both the apprentice and the mentor when the apprentice first starts being the one in the driver's seat. This discomfort is often not actually a lack of confidence in the apprentice, but more likely an indicator that a necessary guardrail is missing. Whether the mentor fears that some piece of institutional team knowledge is not recorded, or that pushing the wrong button could take production offline, that is an indicator that more automation, and thus safety, might be in order.
A perfect task, I might add, for an apprentice.
So far as an apprentice I've had a sandbox, and I've been a shadow. The next major part of the framework was exposure.
During my first few weeks at BuzzFeed, my manager invited me to every meeting he thought I might find interesting, to help me understand all of the people SRE supported. From the infrastructure engineers I worked closely with to product engineers who worked on APIs...folks across disciplines like: creative, design, and yes, of course, News. This allowed me to understand the products and people SRE supports and works with, and the infrastructure I'd be working on. The clarity on user needs and products helped me to find ways to make improvements or solve problems in a meaningful way, rather than just writing code to write code.
In exposing the apprentice to a variety of aspects of the company, product, and team or organization the apprentice will gain a concrete understanding of how their work relates to the larger organizational goals. This exposure also gives both the apprentice and their leaders the opportunity to create meaningful growth milestones based on their interests and goals for personal career development. The more opportunities an apprentice has to experience different areas of the company, the more chances you’ll find a hidden talent, strength, or super power.
The challenging part of “exposure” for an apprenticeship, however, is that your apprentice may be excited about many areas or goals, or be drawn to a range of projects that have nothing in common. This is one of the places where structure is of paramount importance, as one of the skills an apprentice has to learn is how to manage their time and workload. The recurring feedback I got during my time as an apprentice was “learn to invest wisely,” because I was often tempted to try to say yes to everything.
As much as that’s a challenge, it’s a good learning opportunity. One of the biggest complications for SREs is that, due to the event and interrupt nature of the role, at times this can lead to SREs who are running much closer to burnout than is ideal. Your whole SRE team will benefit by modeling good time management and work-life balance for the apprentice.
Exposure is the phase where the apprentice will start really picking up and acting on your company and team cultures. Whether it’s a high empathy environment of HugOps and blameless post-mortems, or a grind culture of frustration and anger, the apprentice will soak these values up during the exposure phase, and it’s will quickly affirm if you’re in alignment with the values you espouse.
Exposure tends to lead naturally to “inclusion,” which is integrating the apprentice into the team as an engineer performing the same tasks as other engineers. They may still need more support or guidance, but they can respond to incidents, take tickets, and be expected to check in code. In my case, this started at BuzzFeed with my participation in an incident.
It was the end of my second week at BuzzFeed and I was working at onboarding and exploring the code base when I heard, from across the little desk island, “Uh, I think there’s a problem.” BuzzFeed.com wasn’t loading on some Android phones. Even as a very baby SRE I had that sudden adrenaline shot that I think most of us in the room can relate to. My first Big Thing!
I didn’t think there was much I could contribute to solving the problem. We quickly established that the reason it was breaking was because our CDN had turned off TLS 1.0, and a relatively notable population of our Android users could not use TLS 1.2 due to their Android version. Not seeing a place to dig into this problem on the engineering side, I decided it would be a good time to learn about TLS and why we were having trouble with this. So I started researching.
Having found a number of pages on TLS and Android, I kept going and quickly surfaced documentation that suggested we could manually alleviate the pain for at least our mobile app users on Android. Not entirely certain that what I was reading was accurate, I looped in my manager who let me know I had found a longer term solution to the problem we were currently in the process of short-term fixing. One week after I started and I was already helping to make our product better for more people.
Inclusion is often the hardest pillar because if handled poorly or without empathy it feels like condescension. When including your apprentice, make sure to value their contributions without “exceptionalizing” them. When the TLS solution I found was previously unknown to my team, for example, they brought up that it was a good and clever solution without making it seem strange or unusual that I would find it.
Frequently apprentices and junior engineers are treated as “raw” or in need of development before they’re “real engineers” and that’s one of the tendencies that inclusion pushes back on. Apprentices have skills and talents, that’s why you hired them! Facilitating their uses of those skills and talents is what allows an apprentice to accelerate your team instead of slowing them down. Trust your apprentices to bring their A game and invite them to do so at every opportunity, and you’ll have mastered the inclusion piece of this.
Tying together the pillars of sandboxing, shadowing, exposing, and including, is “ritual,” the practice I think of as glue. Every engineer has rituals, from morning stand up to the favorite swear they mutter when woken up at three AM by a misbehaving server. At BuzzFeed I developed a few with my mentors and managers. My favorite was, “Everything’s better with ice cream.”
I mentioned earlier that one of my primary mentors was a remote employee with a fabulous jetsetting life. Whenever he was in New York at BFHQ, we would take an hour to go and get ice cream and talk. While much of the talk was about career development, technical challenges, design ideas, or product knowledge...we also talked about the challenges of being a new engineer on a team of very experienced engineers, about being stressed out or upset by difficulty, and about our lives as people. And a few times when I was upset I went out for ice cream by myself, and imagined talking to him like a code duck to help myself work through it.
Another ritual I have with a manager is that when it’s warm enough, rather than sit still and chat, my manager and I take 1:1s walking around a local park and looking for dogs to pet. Managerial 1:1s can be emotionally fraught, and while for some the structure of a meeting room may give comfort, for me sitting through such meetings can make it more difficult to pay attention to them. By walking and talking, my manager and I can stay engaged in the conversation, get some sunlight, and occasionally even meet a cute puppy.
Ritual helps an Apprentice to pick up culture and to feel as though there is some time that belongs to them. It provides normalcy when everything feels new and intimidating. Ritual provides anchors when an apprentice is uncertain, and anchors help the apprentice find stable ground from which to give their best work. It also allows them to realize that even if everything feels somewhat random or in turbulence because they are learning, that there’s still some kind of normalcy in the world.
The last aspect of structured apprenticeship I want to touch on is difficult, but extremely important. What to do when the apprentice encounters difficulty and feels discouraged. How this is handled by leaders is crucial, and will make the difference between a successful apprenticeship and a failure.
Every engineer has bad days or huge blocks. Every engineer runs into difficulty with code, or just has days they can’t focus. Most likely, however, your apprentice will beat themself up when they first start (and possibly long after) when they get into this state. They may question whether they should really be asking for help or if they’re “wasting someone’s time.” They may not know how to handle frustration productively and become sad or angry. Or they may push themselves towards burnout, trying to do it all alone even when they know they can’t.
In these situations, the apprentice’s mentor and manager need to get involved to help them learn how to manage these states. The apprentice won’t have the experience of having struggled with feature releases or bug hunting. Even if they have a semi-formal background, they won’t have worked at the scale you’re working at professionally. It can be overwhelming, and even the best apprentice engineers will sometimes find it hard to continue.
While none of this should excuse an apprentice not “finishing work”, remembering it can help keep mentors and managers mindful to check in and see if an apprentice is struggling if their productivity drops. Helping them learn early how to avoid burnout and anger with their work will also help to create the strong foundation of a good engineer.
Sandboxing teaches an apprentice that they have ownership and empowerment. It gives an apprentice a place to safely build and break code, which is the primary experience any engineer needs to get better at their craft. By making the sandbox meaningful, the apprentice SRE learns things like incident response and infrastructure management with real stakes, while not endangering the ability to meet business goals.
Shadowing teaches an apprentice confidence. In my opinion, one of the most important qualities in an SRE is the ability to confidently make decisions in a crisis or incident, and through shadowing the apprentice gets the opportunity to learn that skill from the best engineers in their organization, their fellow SREs.
Exposure shows an apprentice what kind of projects they’ll have the opportunity to work on as an engineer, and to help them find their special skills and talents, the things that make them a great SRE. It addition, it becomes a chance for the other engineers in your organization, those who aren’t involved with the SRE team daily, to become familiar with the newest addition to your team.
Inclusion is both the integration of the apprentice SRE into your team, and integrating the talents and skills they show into your work. Even before the apprentice “graduates” from apprenticeship, the work they does will bring a perspective you were missing, after all that’s why you hired an apprentice. Inclusion also helps the other engineers in the team remember that they were once beginners too.
Ritual provides the stable base on which this all rests, building patterns for the apprentice that demonstrate they are valued as an engineer and their contributions are valued as well. It helps the apprentice to develop good patterns and to learn that it’s safe to talk about their blocks and struggles with engineering by creating a space in which that is both expected and encouraged.
And checking in on the apprentice’s emotional and mental well-being helps keep apprenticeship strong for both the apprentice and your team. It also reminds your more senior engineers that their emotional and mental health is important to the success of the team and that they should be mindful of their own emotional states.
One final story to illustrate that, from when I was working on building the sandbox environment. At one point I got very frustrated with either AWS or the code I was working with, and I made an incredibly sarcastic comment. The kind we’ve all heard someone make, along the lines of, “well, ugh. Who thought that implementation was a good idea?”
My mentor on the project turned to me and said, “At BuzzFeed we have this thing, the no-haters manifesto. I think it might help you to take a look at it...you seem frustrated, and your frustration is getting in your way.”
I read the manifesto, and it really stuck with me. Being reminded that a positive outlook would make the work go faster, and that I had the option to choose taking a break and walking away for a while when I was frustrated...was a game changer for me. It gave me a set of coping skills for difficult engineering, and even though I haven’t reviewed it as recently as I should have, it sticks with me in everything I do.
There’s a tweet out there somewhere that reads: “I only hire senior engineers...I just do it ten years before everyone else,” and if anything, that’s the attitude I hope you take from this talk. Your future SREs, senior engineers, and even principles are out there. Go hire them ten years before anyone else does! Thank you.