Platform Engineering, Part 1: WHY — The Evolution of Developer Cognitive Load
2023 is the year we’ll see the rise of Platform Engineering!
It appeared in the latest Gartner’s 2022 Hype Cycle for Emerging Tech, and it had its first dedicated conference this summer: PlatformCon 2022, with 7K+ online attendees.
In this three-part series, we’ll cover the topic of Platform Engineering, and explain Why, What, and How we’ve been doing Platform Engineering, at Agorapulse, a medium-sized product-led scale-up.
Part 1 is about WHY the evolution of developer cognitive load is giving Platform Engineering so much traction
Part 2 is about WHAT Platform Engineering is, and WHAT goals a Platform Engineering team should have
Part 3 is about WHEN & HOW to build an Internal Developer Platform
During the last 30 years, with the arrival of agile methodologies, DevOps philosophy, and cloud-based infrastructure, the roles and responsibilities of a software developer have significantly evolved.
Here’s a summary of those changes.
Pre-2000s: The “traditional” development era
During this “prehistoric” era, software developers often lived in the company basement, hidden away from customers (they obviously never talked to them).
With Waterfall development, stakeholders would simply send detailed specification documents, describing features to implement, with unachievable deadlines, down to the basement. Once developed and built (and tested by a dedicated QA team), artifacts to be deployed were then “thrown over the fence” to the Operations team. There was usually a single gatekeeper, aka Sysadmin(s), firefighting to run and scale those apps and databases, on home-made on-prem systems.
The specialized roles for Dev, QA, and Ops teams created efficiencies within each development phase while creating inefficiencies across the entire life cycle.
In the end, friction and communication silos slowed down this way of working: Getting an idea or a fix delivered to customers could often take months or even years!
On the plus side, developers’ lives were pretty simple. They would focus on “just” coding and mastering a single tech ecosystem: C/C++/Java for “desktop app developers”, or PHP/ASP/CFML for “web developers”.
Cycle time: MONTHS or YEARS
Developer cognitive load: LOW 🤓
2000s — The agile development era
Then, the Agile movement brought programmers, testers, and business representatives together, to iteratively build software. Things got much better: Efficient collaboration meant that time to deliver value to the customer decreased to just a few weeks or months.
But there was still a remaining silo and the ‘fence’: The Operations team was still isolated, using different tools and methodologies and dealing with on-premise infrastructure.
Cloud then started to become a thing, with AWS launching in 2006 (EC2, SQS, and SES only). But it was only used by early adopters, which were mostly startups.
Business & Development teams strived for change, whereas Operations teams strived for stability and reliability. The conflict between Ops and Devs resulted from divergent goals and incentives and it generated blame games and finger-pointing.
In terms of developer cognitive load, the organization of teams and the layer of ownership was functional and tech-based (backend, frontend, database, QA, security, infra, etc). Therefore, the required knowledge for developers was still pretty limited to a single ecosystem and functional expertise.
Cycle time: WEEKS or MONTHS
Developer cognitive load: LOW 🤓
The 2010s: The DevOps revolution and tech feature teams
The DevOps movement extended the continuous development goals of the Agile movement to continuous integration and release.
The last silo was removed, and developers and operations folk were finally able to start working together to benefit the overall business.
Cloud-based infrastructure started to become the new standard. Thanks to a fully automated CI/CD pipeline and deployment, the Build-Measure-Learn feedback loop was reduced to several days or even hours, with continuous delivery.
To improve their velocity and respond to the software architecture changes (monolith to microservices and the famous Conway’s law), organizations started to radically change how they were structured, moving from large functional departments (development, QA, operations, release) to smaller, independent tech feature teams (usually sized as a 2-pizza team).
But in practice, the famous “you build it, you run it” mantra was mostly limited to automated CI/CD, and the responsibility of the RUN and the infrastructure was still fully on the CloudOps / SRE team.
In terms of the developer environment, everything started to become more complex: From mastering fullstack development ecosystems (JS ecosystem madness) to running/debugging those shiny microservice architectures on their machines.
Gone were the days of the single backend monolith based on a single language/framework, using a single traditional relational database.
Cycle time: HOURS or DAYS
Developer cognitive load: MEDIUM 😨
The 2020s: The age of cloud-native and empowered Product teams
Product teams and DevOps responsibility
Nowadays, modern product & engineering organizations have moved away from feature development factory teams, with engineering only focusing on delivery, towards fully empowered cross-functional Product teams, which are a combination of product, design, and engineering.
These Product teams are usually led by a product trio (Engineering Manager + Product Manager + Product Designer) and cover the full lifecycle of their business domain, from customer discovery and feature roadmaps, to development, deployment, and monitoring.
But many companies are finding this full lifecycle move difficult: Where do they put the Ops team or “DevOps responsibility”?
“Fully 38% of devs said they instrument the code they’ve written for production monitoring (up from 26% in 2021 and just 18% in 2020) and 38% monitor and respond to the infrastructure their apps are running on (up 25% from last year).” — Gitlab DevSecOps Survey (2022)
Unfortunately, if the formal Ops role or team is removed, the responsibility usually falls onto the shoulders of the developers.
Infrastructure as code and cloud-native complexity
Even if most devs don’t want to do Ops, they now have to learn infrastructure as code (“IaC”, usually through Terraform) and cloud-native managed services, on top of their existing frontend and backend coding knowledge. For AWS alone, it’s dozens and dozens of services.
Micro-based architecture and deployment complexity
On the frontend side, to allow autonomous Product teams to be fully decoupled and to release independently, the monolithic frontend now has to be split into micro frontends.
This is yet another challenge for the developers, including architecture, development, and deployment.
Shift left security and QA responsibility
You can also add the impact of “shifting left” operational, quality, and security concerns to the mix. Many studies show that finding a bug or a security issue before it gets to production might save thousands or millions of dollars, but that’s, again, more responsibility on the shoulders of developers.
Full lifecycle developers
Although the “you build it, you run it” and “shifting left” mantras are solving problems across the full lifecycle, they’re also massively increasing developer responsibilities.
Pre 2010s, developers required knowledge was limited to:
- BUILD Frontend or Backend: Languages, framework, testing, etc
Today, developers required knowledge has expanded to:
- BUILD Frontend and Backend and Infrastructure: languages, framework, testing, architecture, cloud services, etc
- SHIP: CI/CD delivery pipelines, feature flagging, versioning, etc
- RUN: Observability, monitoring, debugging, security, scaling, support, etc
Complexity is killing software developers.
Developer cognitive load: INSANE 😱
Meanwhile, the most important job of a developer is to deliver features and business values to end users!
Every developer should focus on “driving/developing”, and let a Platform team provide the “driver/developer” experience! That’s the goal of Platform Engineering.
At Agorapulse, we started in 2010 with a cloud-native 100% running on AWS, supported by a 100% DevOps / NoOps approach.
With our move to autonomous Product teams a few years ago, and the growing complexity of our architecture, we started to face this growing cognitive load challenge, especially when introducing Infrastructure as Code and shifting left quality and security.
Reducing the ever-growing developer cognitive load is the number one reason WHY Platform Engineering is getting so much traction nowadays.
The question is, what can a Platform Engineering team do to lower this cognitive load and improve the developer experience / productivity?
Now, let’s dive into WHAT the goals of Platform Engineering are, in part 2!