aPhoto by Christian Erfurt on Unsplash

The conundrum of serverless lock-in & spiralling complexity: Is it all worth it?

Should ‘serverless lock-in’ and spiralling complexity be a consideration for engineering leaders in large organisations who want to protect their core business value with evolutionary architectures whilst gaining the benefits of serverless?

Serverless Advocate
18 min readJun 29, 2023

--

TL;DR — Yes, Serverless is worth the hype with the correct guardrails in place. They are listed at the bottom of the article.

Introduction

In the ever-evolving landscape of cloud computing, serverless architecture has emerged as a game-changer; empowering engineers to rapidly build and deploy applications with unprecedented speed and agility. This typically takes the form of plugging together composable managed services using direct integrations to model our business processes through many event-driven service integrations.

The speed and agilty of serverless teams — photo by Cherrydeck on Unsplash

“How can we future-proof our serverless solutions, harnessing the benefits of speed and agility while avoiding the pitfalls of complexity, technical debt, and serverless lock-in?”

Indeed, the allure of serverless is undeniable. It enables quick starts, accelerated development cycles, and seamless scalability, making it an attractive choice for teams aiming to deliver value to their users swiftly. However, as organisations delve deeper into the realm of serverless at scale, they often encounter a complex web of configuration options, training needs, many opinionated patterns, and masses of best practices that can lead to unforeseen complexity, large amounts of technical debt, and high cognitive load. In reality, as serverless is so new as a paradigm, we are still working through these best practices.

“While serverless offers an initial burst of productivity, building applications to production-level maturity within this paradigm introduces significant cognitive load”

While serverless offers an initial burst of productivity, building applications to production-level maturity within this paradigm introduces significant cognitive load. Teams must grapple with intricate configuration setups, architectural decisions, and the need to adhere to best practices to ensure robustness, security, scalability, and maintainability. Failure to address these considerations can result in technical debt, architectural fragility, and the accumulation of engineering stress.

High cognitive load in teams — photo by Cherrydeck on Unsplash

One critical question emerges: “How can we future-proof our serverless solutions, harnessing the benefits of speed and agility while avoiding the pitfalls of complexity, technical debt, and serverless lock-in at enterprise scale?”

In this article, we will explore strategies for navigating this serverless conundrum, striving for evolutionary architectures that can adapt and scale without sacrificing long-term sustainability. We will delve into techniques to manage the cognitive load, and foster a forward-thinking approach that minimises the risk of lock-in while maximising the benefits of serverless computing.

Glossary of Terms

Before we go any further I want to explain some common terms that I regularly use with teams and will use throughout this article:

North Star. “North Star” refers to a shared vision or guiding principle that aligns teams towards a common goal across an organisation.

Two-way Door. In the AWS context, a “two-way door” represents the ability for customers to easily reverse decisions or actions if they prove to be unsatisfactory or ineffective. This principle encourages customers to make reversible choices, providing flexibility and minimising the impact of decisions by allowing for easy adjustments or reversals without significant consequences. It goes without saying, a “one-way door” (irreversible decision without consequences) is not something we should be striving for.

Intrinsic Motivation. The theory of “intrinsic motivation” at an individual level talks about competence (the ability), autonomy (choice and independence to pursue interesting work), and purpose (why does it matter?).

Cognitive Load. In software engineering teams, cognitive load refers to the mental effort and capacity required by team members to understand, process, and manage the complexities and information associated with their tasks and responsibilities. High cognitive load can result from factors such as complex codebases, intricate system architectures, extensive documentation, and the need to juggle multiple tasks simultaneously, leading to reduced productivity, increased stress, and potential errors.

A brief history of engineering time on AWS…🕐

Around ten years ago you may have been typically working with more monolithic architectures, where the integrations between ‘domains’ were done with a shared database and at the code level; and event-driven architecture use was more scarce. In this World it was easy to understand system boundaries, and it made the high level representations of systems very easy. As we typically extended existing system interfaces it tended to keep things uniform, increase pace of development, and reduced cognitive load.

“This was a World before mainstream Serverless and before the introduction of Lambda functions in late 2014.”

This was a World before mainstream Serverless and before the introduction of Lambda functions in late 2014. Most of the time these monolithic solutions took the form of Express apps (or other frameworks) deployed in entirety to EC2 as one large deployment — and they, “just worked”.

Looking back, what were the main considerations and complexities that we had day to day as engineering teams on AWS from a service point of view?

  • How do we deploy the code? For us, it was a ‘live services’ team who manually ran the deployment out of hours using AMI’s for us.
  • How do we roll back the full solution if we have an issue? This was typically answered through deploying the previous AMI.
  • Where do our databases live? A lot of the time for us it was MongoDB deployed in the same VPC as the services on EC2.
photo by Brooke Cagle on Unsplash

How does that compare to today typically in the Serverless World?

  • Which IaC? Should I use the AWS CDK? Serverless Framework? SST? AWS SAM? Terraform?…
  • Many service configurations. Which of the vast amounts of configuration options do I need to set on each service? What is the cost or security implications of getting it wrong?
  • Service choice. Which service should I use? For example, EventBridge vs MSK vs SNS vs SQS.. they all look so similar?
  • How and when should I integration and e2e test? Should I integration test between two AWS services, and more importantly, how? What about e2e for a process flow of multiple AWS services integrated?
  • Patterns. What the hell is an ‘adjacent-list’ or ‘storage-first pattern’? Should I favour direct integrations or glue code with Lambda functions? Should I use Step Functions for orchestration — and what is a Saga pattern?
  • Service Limits. What are the service limits per serverless service that I use? Are any of these hard limits? Will this bite me on the ass at a later date?
  • Provisioned and Reserved Concurrency. What are these and when should I use them?
  • AuthZ and AuthN. Which services should I use for authentication and authorisation? IAM can take time to learn, and this facilitates the integration of serverless services.
  • Logging, Tracing and Metrics. How do I use these? How do they differ?
  • Event-driven Architectures. What is event-carried state transfer? Idempotency? Eventual consistency? metadata vs data? What are targets and routing?… and more.
These considerations often lead to analysis paralysis, slow down, high cognitive load and frustration — photo by Sander Sammy on Unsplash

This is shown with the vast amount of content serverless advocates like myself are creating to help others in the community deal with these complexities:

I think you get the picture though, and I could have gone on and on — Serverless is inherently more complex than more traditional architectures on AWS, so how do we work around this and support teams across an organisation in its adoption?

“Serverless is inherently more complex than more traditional architectures on AWS, so how do we work around this and support teams across an organisation?”

How do I make this work the right way? photo by Tim Gouw on Unsplash

Don’t believe me? Let’s look at a basic example.

OK, so you perhaps don’t believe me. Let’s have a look at a typical serverless architecture for a small domain service:

A fictitious high level example of a typical AWS serverless service

“There are almost two hundred different service configuration options alone in this one small diagram when using the AWS CDK”

At a quick glance, a few things to note:

  • It is very difficult for a person joining the team to understand the solution architecture. There is so much noise with the amount of icons and lines compared to more traditional architectures. (Yes we can add numbers, add a key, and annotate; but it is still complex!). Over time these services tend to grow bigger and more complex with new features — but very rarely in my experience will teams take the time to split this down into smaller domain services.
  • There are almost two hundred different service configuration options alone in this one small diagram when using the AWS CDK (and for brevity we have missed off key services like Secrets Manager, Code Pipeline, networking services, our EventBridge Targets, DLQs, and parameter store where we haven’t included there own configuration options). This can cause huge cognitive load on teams and analysis paralysis.
  • This is purely infrastructure — and it hides the other complexities listed above in the previous section too.
  • There are various patterns at play here like ‘storage-first’ and ‘change-data capture’ which teams need to understand.

This small example alone demonstrates the sheer amount of things to consider when building even the smallest of production serverless services.

So why even look at serverless as an option?

I think we have successfully called out that serverless architectures are vastly more complex than traditional architectures of the past in terms of engineering, and there is a steep learning curve in productionising them correctly (some say this learning never seems to stop!). So what are the benefits of using these serverless services together if they are more complex?

✔️ Shared responsibility model. Engineers are freed of this additional maintenance burden, and can focus specifically on the application code.

✔️ Scale Out. Teams no longer need to worry about applications scaling out with spiky traffic; and can leave the service to deal with this complexity.

✔️ Scale In. We only pay for what we use with serverless, so the resources scale back very quickly when not utilised.

✔️ Pay for use. We only typically pay for what we use with serverless, so no up front costs to test out new features or ideas.

✔️ Agility & Speed. This is based largely on the other points above.

✔️ Direct integrations. Direct integrations are glue code between services that require no code other than the IaC side. The less code we write the less chance of bugs and ongoing maintenance.

✔️ Feature evolution. AWS listen to customers and release new service features at a frightening pace! This means that the services are always evolving for the better.

In the next section, let’s look at what happens when autonomous serverless teams in an organisation get started on their journey — and what can happen fairly quickly as a result.

👇 Before we go any further — please connect with me on LinkedIn for future blog posts and Serverless news https://www.linkedin.com/in/lee-james-gilmore/

The ‘Serverless Dunning-Kruger’ effect

Many autonomous teams embark on their North Star journey with unwavering confidence, believing they already possess the necessary knowledge and skills to seamlessly transform their ideas into production-ready serverless solutions. Many have quick wins in pushing out services and new ideas to production through intrinsic motivation, and over a small period of time have multiple services out in the cloud on AWS.

“they have only scratched the surface of the intricate web of complexities that surround building and maintaining serverless architectures”

However, as time passes, a humbling realisation sets in: they have only scratched the surface of the intricate web of complexities that surround building and maintaining serverless architectures.

https://commons.wikimedia.org/wiki/File:Dunning%E2%80%93Kruger_Effect_01.svg

This phenomenon, often referred to as the “Serverless Dunning-Kruger Effect,” mirrors the cognitive bias identified by psychologists David Dunning and Justin Kruger. It describes a common occurrence where teams starting out with serverless development overestimate their level of knowledge, blissfully unaware of the vast array of well-architected pillars, configuration options, and external complexities that lie ahead.

Note: I don’t agree with the term ‘Peak of “mount stupid”’ personally; this is the same for any new skill anybody picks up (whether that be DIY or surfing) — and I would say “peak of unwitting unknowns & enthusiasm’.

The second stage of the Serverless Dunning-Kruger Effect often brings frustration and surprise as teams realise that they have unwittingly overlooked crucial elements of a well-architected serverless application, created many ‘one-way doors’, and they may have accumulated a lot of technical debt and security issues. They come face to face with the realisation that achieving production-level maturity demands a deep understanding of the myriad of configuration options, best practices, and patterns that shape the serverless landscape.

“But it is through this humbling experience that teams begin their ascent toward becoming high-functioning serverless teams”

But it is through this humbling experience that teams begin their ascent toward becoming high-functioning serverless teams. Over time, they gain a deeper understanding of the intricacies, overcoming complexities, and acquiring the knowledge necessary to navigate the serverless ecosystem with finesse. Do we want our teams learning through experimentation, wins and failures — absolutely! Do we want to protect our teams and business with guardrails around this autonomy — absolutely!

A quick word on the varying scale of autonomy and guardrails

When implementing serverless in a large organisation to progress to your North Star, the balance between autonomy (through intrinsic motivation) and guardrails is crucial in my opinion. They are not mutually exclusive.

“It has become almost trendy to talk only of autonomy these days and never about guardrails”

Autonomy refers to granting teams the freedom and authority to make decisions and take ownership of their serverless solutions end to end. On the other hand, guardrails establish a set of predefined policies, guidelines, and best practices to reduce duplication, and ensure consistency, compliance, and security across the organisation’s serverless estate.

“How do you stop a team from metaphorically driving at 100 mph over a cliff edge due to poor judgement? Create structure — both cultural and organisational — to guide their decisions.” — https://aws.amazon.com/blogs/enterprise-strategy/letting-go-enabling-autonomy-in-teams/

To mitigate concerns regarding potential serverless chaos, leaders can foster a culture of strategic thinking throughout the organisation. This entails ensuring that every individual, regardless of their position, possesses a clear understanding of the business model, strategic initiatives, and how their work can contribute to advancing the organisation towards the North Star within the boundaries of these best practices and patterns.

Guardrails provide engineers with the necessary framework to enhance productivity and efficiency, offering a more effective means of protecting a company’s strategic objectives compared to cumbersome bureaucratic controls and ivory tower architecture practices.

Without the right guardrails in place what can happen?

Without the right guardrails in place when organisations go all in on a ‘serverless-first’ approach on AWS with total autonomy you can run the risk of:

Complexity at scale
  • At scale serverless solutions across an enterprise can become hard to understand and reason about if you are not careful.
  • We quite often have duplication of effort as teams work in autonomous silo’s, domain leak across the org, and lacking any bounded contexts.
  • We have high churn in cognitive load and potential rework cycles through technical debt.
  • We find a complex web of synchronous API interactions across the solutions, and asynchronous communication across a host of differing serverless event and streaming services. The ‘Lambda pin-ball architecture’ at scale.
  • In the diagram above, where is your organisations core domain logic? It can become dispersed across a sea of serverless service integrations or embedded in complex Step Function ASL which is hard to test and reason about. If this domain logic is your core differentiator from competitors, do we want it engineered in this manner i.e. ‘serverless lock-in’?
  • Many teams talking to similar 3rd party vendors for the same needs and tooling, increasing cost and purchasing processes.
  • Business logic can become entangled with framework and service specific configuration, API calls and function handlers; making maintainability of code tiresome, slow, error-prone, and not built to be evolutionary as we would want.
  • The only constant in life is change — and both the AWS services will change (for the better as services are superseded, think EC2 -> ECS -> Fargate -> Lambda), and your business will evolve — so how do we ensure that your core business logic is not serverless service specific and locked-in?

“If this domain logic is your core differentiator from competitors, do we want it in this manner? i.e. ‘serverless lock-in”

So we have now discussed some of the issues which ‘can’ present themselves commonly whilst on this serverless journey, but how can we mitigate against them?

The light at the end of the serverless tunnel: what can we do to mitigate these issues?

In my experience there are a number of things which we can strategically put in place across a large organisation as technical leaders to help mitigate these potential issues at scale, and to help teams up-skill and gain experience in a safe way while protecting the business. We will highlight the key areas below, giving a summary, and links for further reading material.

✔️ Serverless Architecture Layers

This opinionated approach is a conceptual model which allows us to think about the right key decisions and focus areas at the right times for larger organisations looking at ‘serverless-first’ — covering everything from platform engineering, event-driven architecture (EDA) strategies, aligning to system boundaries and domain-driven design, reducing duplication and cognitive load, considerations around cross-cutting dependencies, and how we do authentication at scale across an organisation securely on AWS.

How this helps: This conceptual model prevents domain leak and helps define system boundaries, whilst promoting platform engineering to reduce duplication and to move quicker through tackling undifferentiated heavy lifting and cross-cutting concerns across the organisation.

✔️ Clean Code: Hexagonal Architectures

Hexagonal architectures can play a crucial role in making serverless-first companies’ solutions evolutionary while safeguarding their core business logic that sets them apart from competitors. By structuring the architecture around the core business domain, with clear boundaries and separation of concerns, hexagonal architectures enable easy adaptability and scalability. The use of adapters and use cases allows for seamless integration of new serverless services and components, making it possible to evolve the solution without compromising the core logic.

How this helps: This flexibility enables companies to quickly respond to market changes, experiment with new technologies, and innovate while preserving the essence of what makes them unique (in code and not ASL or equivalent), ultimately maintaining a competitive advantage in the serverless landscape.

✔️ Team Topologies

Team Topologies in the serverless world focuses on organising teams in a way that aligns with the needs of serverless application development to reach your North Star. It promotes the formation of small, autonomous domain teams responsible for end-to-end development and maintenance of specific serverless microservices. These teams can deliver value and respond to customer needs more efficiently, and are typically named ‘stream-aligned’ teams.

Additionally, Team Topologies advocates for platform teams that provide the necessary infrastructure, undifferentiated heavy lifting, tooling, and shared services to support product teams. Platform teams manage landing zones, deployment pipelines, observability tools, and other essential components required for the smooth operation of serverless applications.

To ensure effective collaboration and minimise dependencies, Team Topologies emphasises clear interactions between teams. Stream-aligned teams handle entire value streams, ‘enabling teams’ help support stream-aligned teams with up-skilling and and end-to-end delivery, while complicated subsystems represent specialised areas that require dedicated expertise.

How this helps: By adopting Team Topologies, organisations can optimise their team structures, enabling faster feedback loops, improved knowledge sharing, and enhanced delivery speed in the serverless ecosystem. One key area for me personally is enabling teams that can help support teams as experts to reduce the chances of technical debt and security issues, whilst also aligning teams with the best practices, patterns, and chosen technologies.

✔️ Domain-Driven Design (DDD)

Domain-Driven Design (DDD) is an architectural and design approach that focuses on modelling a software system around the core business domains. In the context of the serverless world, DDD provides a valuable framework for building robust and scalable serverless applications.

At the heart of DDD is the concept of bounded contexts, which define clear boundaries around different business domains. By identifying and delineating these contexts, serverless applications can be designed and organised in a modular and decoupled manner. Each bounded context can have its own set of serverless functions, event-driven workflows, and data stores, enabling independent development and deployment.

How this helps: Leveraging DDD principles in the serverless world promotes flexibility, scalability, and maintainability, as the architecture evolves along with the evolving business needs. It also helps define the system boundaries for our domain services, reducing the chance of duplication and domain leak.

✔️ C4 Model

The C4 model, developed by Simon Brown, is a visual approach for effectively communicating software system architecture across an enterprise at scale. It consists of four hierarchical levels: System Context, Containers, Components (AWS Diagrams typically), and Classes.

At the System Context level, an overview diagram provides a high-level perspective of the system landscape, identifying its external actors and their interactions. The Containers level zooms in further, illustrating the major containers that make up the system (such as web servers, databases, or microservices) and their relationships. After this point, I personally think we should be delving into AWS Serverless architecture diagrams at the ‘Component’ level.

How this helps: What the C4 model gives us is a way of serverless-first teams across a large organisation viewing what systems are talking to each other at a high level, and where the bounded contexts are — without a mass of huge complex AWS diagrams. Zooming in further through the containers and to the AWS diagrams allow teams more context and less cognitive load in trying to understand everything in one diagram.

✔️ Tech Radar

A custom ThoughtWorks Tech Radar provides invaluable support and benefits to any organisation going ‘serverless-first’. By regularly publishing and updating the Tech Radar, teams can align on technologies, tools, frameworks and techniques inline with the overall technical strategy, whilst facilitating a knowledge-sharing platform, training alignment, encouraging collaboration and fostering a culture of innovation.

How this helps: The Tech Radar is a set of guiding guardrails that mean teams can align on a set of technologies, allowing them to benefit from developer experience platforms, CLIs for code generation, common pipelines, composable code (below), and architecture patterns and practices. If teams use a myriad of languages and frameworks in a fully autonomous world then the above becomes a pipe dream.

✔️ Composable Architectures (AWS CDK Custom Constructs)

Organisations can leverage AWS CDK custom constructs to establish alignment on reference architectures, security measures, best practices, and patterns. By internally publishing and consuming these constructs, they can build composable solutions using Infrastructure as Code (IaC) principles. This approach reduces the cognitive load on teams, allowing them to move quickly within predefined guardrails.

How this helps: The use of CDK custom constructs enables teams to benefit from consistent and reusable building blocks, streamlining development processes, reducing cognitive load with the complexities of configuration options, and ensuring adherence to organisational standards and guidelines.

Conclusion

OK, so this wasn’t my typical style of article, written with personal experiences across many large organisations adopting serverless, and one purely to provoke thoughts and conversations, but hopefully you have taken something from this?

Please go and subscribe on my YouTube channel for similar content!

I would love to connect with you also on any of the following:

https://www.linkedin.com/in/lee-james-gilmore/
https://twitter.com/LeeJamesGilmore

If you enjoyed the posts please follow my profile Lee James Gilmore for further posts/series, and don’t forget to connect and say Hi 👋

Please also use the ‘clap’ feature at the bottom of the post if you enjoyed it! (You can clap more than once!!)

About me

Hi, I’m Lee, an AWS Community Builder, Blogger, AWS certified cloud architect and Global Serverless Architect based in the UK; currently working for City Electrical Factors (UK) & City Electric Supply (US), having worked primarily in full-stack JavaScript on AWS for the past 6 years.

I consider myself a serverless advocate with a love of all things AWS, innovation, software architecture and technology.

*** The information provided are my own personal views and I accept no responsibility on the use of the information. ***

You may also be interested in the following:

--

--

Global Head of Technology & Architecture | Serverless Advocate | Mentor | Blogger | AWS x 7 Certified 🚀