Photo by Alexander Jawfox on Unsplash

Unlocking Serverless Superpowers: Mastering the 8 Crucial Design Patterns Every Engineer Should Know

Discover the game-changing potential of Serverless development as we unravel the eight essential design patterns that every developer should have in their toolkit, allowing you to compose the patterns together to create serverless architectures for any use-case.

Serverless Advocate
22 min readAug 2, 2023

--

Introduction

In this article, we’ll explore the transformative power of Serverless architecture and delve into the eight key design patterns that every aspiring Serverless developer should be well-versed in. Prepare to enhance your skills and unlock the full potential of Serverless technology in your projects.

In Part 2, we will talk through the AWS CDK TypeScript code for each of the patterns, including the composed versions.

“We will then go on to discuss how we can compose the patterns together to build almost any serverless use-case.”

The eight patterns we will discuss are:

✔️ The Simple Service.
✔️ Storage First Pattern.
✔️ API Proxy Pattern.
✔️ Event Gateway Pattern.
✔️ Call Me ‘Maybe’ Pattern.
✔️ Change Data Capture (CDC) Pattern.
✔️ Transactional Outbox Pattern.
✔️ Saga Pattern.

We will then go on to discuss how we can compose the patterns together to build almost any serverless use-case.

if you would prefer to watch a high-level over of this article then feel free to watch the following video first:

👇 Before we go any further — please connect with me on LinkedIn for future blog posts and Serverless news https://www.linkedin.com/in/lee-james-gilmore/

Let’s walk through the patterns

Let’s now walk through the eight patterns and discuss the advantages and disadvantages of each.

1. The Simple Service

Let’s start with the most common and simplistic service of all, the “simple service”, which is sometimes known as the “The Comfortable REST” pattern. This is typically API Gateway as an API for synchronous requests, with Lambda functions as the compute layer, and DynamoDB as the data store. This pattern scales really well with high-concurrency.

“This pattern scales really well with high-concurrency.”

Most people have heard me talk about SALAD stacks in the past, which for the most part is a ‘simple service’ with a frontend. It is essentially:

S — S3 bucket for hosting a frontend application, for example, a React app.
A — API Gateway for the REST API.
LA — Lambda functions for the compute layer.
D — DynamoDB table for persisting and retrieving data.

This is the standard ‘go to’ pattern for most serverless engineers when building out scalable services, and covers 80–90% of the scenarios that people need in my opinion.

The benefits of the simple service pattern are:

✔️ This is super simple to code and is rinse and repeat for the most part.
✔️ It will cover most scenarios that you will hit as an engineer.

What are the potential issues with this approach?

⭕ You don’t get the potential benefits of direct integrations, and run the risk of your functions simply being glue code without any business logic.

2. Storage First Pattern

The storage-first approach can be used to increase resilience and durability in serverless solutions through persisting the requests as soon as viably possible through direct service integrations before performing any business logic, and then subsequently processing the requests asynchronously (including re-processing any failures). In this approach the original request data is always persisted, and the business logic performed later. If an error does occur, the original data is still available.

“This pattern removes the need for Lambda functions, therefore removing the need to parse, transform, process and persist the request data.”

This pattern is very different to the typical approach of processing requests as soon as possible through compute in a synchronous manner using services like AWS Lambda, and returning the processed payload in the response. Let’s have a look at a quick diagram:

It is very important to note this is an asynchronous process as you will be processing the request at a later date (which may well only be milliseconds when reading from a queue for example, but could be longer); but you won’t be returning a synchronous response to the customer with any information about the processed request, other than simply a 202 status code to say the request was ‘accepted’ and we will process it later. There are also other services we can use with this pattern (but not limited to):

Note: You can obviously replace REST APIs in this example with GraphQL using AWS AppSync (which we discuss later in the article).

What if we did want to persist first and also send back updates to the user as we process the requests?

This is certainly possible through using either AWS Step Functions or AWS AppSync Subscriptions, but a tradeoff is the user waiting for a response of course.

The benefits of the storage-first pattern are:

✔️ This may be a benefit in mission critical systems as you persist the request immediately so it is never lost (especially coupled with DLQs and the direct integration tested through e2e tests).
✔️ This approach scales very well as you persist the requests and then process them later at your own speed, rather than needing to think about the compute aspect at high request volumes when doing it synchronously.
✔️ You very rarely lose a request for your end users, as you are simply returning a 202 status code to say accepted.
✔️ Very quick response for your end user when it is simply a 202 response.
✔️ Resilient, as using a direct integration in this way you have no compute, like Lambda functions as glue code, therefore less chance of bugs or configuration issues being introduced.

What are the potential issues with this approach?

⭕ You need to ensure that you can re-process the requests without the need of the customer for input or actioning anything if this is fire and forget to them. You have previously returned a 202 request and they would have moved onto the next action. If you have an error 2–3 seconds later say, the customer would have moved on to their next action typically (or in a web app scenario this could be a next page or closed down the browser altogether).

⭕ Storing data first and then processing it can introduce additional latency compared to processing data directly on compute resources, as the process is now async in nature. The data needs to be fetched from storage services, which adds an extra step before processing can begin (and you may also decide to throttle the requests under high load).

⭕ Depending on the service used for the storage first pattern, there are different limitations such as message/event payload size; so this is a consideration. An example is an Amazon EventBridge event payload is a max size of 256kb, whereas Kinesis Data Streams is a max payload size of 1MB. If ordering is vital, the Kinesis or SQS FIFO queues can meet this requirement.

Where have I used this approach where it is fire and forget?

✔️ A Communications domain service where consumers (other domains) can send a payload and a template ID (a command) and it is the responsibility of the communications service to send the SMS or email using a lambda to read from the SQS queue (including any retries on errors as no further user input is needed).

✔️ A payroll service which allowed for uploading files to S3 using a pre-signed upload URL, and a step function ultimately parsing the file, transposing for the correct government service, and sending the new payload to the correct service endpoint.

Where have I used this approach when giving the consumer regular updates?

✔️ Users downloading their payslips where we used an async process but with regular updates using AWS AppSync Subscriptions, culminating in a pre-signed download request for the browser on completion so it is downloaded in realtime.

The following article covers this in detail with a dedicated code repo:

3. API Proxy Pattern

Next we are going to talk about the API Proxy pattern, whereby we use an API as a ‘front-door’ to one or more underlying systems and microservices, where customers only need to authenticate once.

Note — when using GraphQL this is sometimes referred to as the “Cherry Pick” pattern.

The power of this pattern is that it allows us to have one API which exposes specific functionality from multiple downstream systems, which may also orchestrate where needed (more on Saga patterns later). Below we can see what this looks like with a REST API using API gateway:

We could also have the exact same pattern with GraphQL and AWS AppSync using Lambda resolvers:

The benefits of the API Proxy pattern are:

✔️ We can have one entry point for external customers to consume our downstream systems in one safe and aggregated way, without exposing all functionality of the downstream systems.

✔️ We can use the proxy Lambda function as an orchestrator, or perhaps utilise Step Functions to create a Saga pattern.

✔️ We can add a level of caching at the API proxy layer where required.

✔️ It allows us to manage the authentication and authorisation once and it be close to the customer, allowing the communication between the proxy API and downstream systems to be a M2M flow (Client Credentials flow), or IAM with SigV4.

✔️ This is an ideal pattern when basing your architecture on Serverless Architecture Layers (SAL Architecture):

What are the potential issues with this approach?

⭕ You really need to think about ownership of this API from an engineering perspective, as it sits across many downstream systems. We always want to limit hand offs and dependencies between teams.

⭕ Your customers may experience additional latency with invoking the proxy Lambda function.

The following articles discuss the two patterns at length, and have the accompanying code repos:

4. Event Gateway Pattern

The event gateway pattern is an innovative approach that allows legacy or 3rd party systems to raise events into your enterprise service bus (ESB), therefore allowing other domain services in your enterprise to consume the published events. It does this by putting an API in front of the ESB, which has a direct integration:

As you can see from the diagram above, legacy applications or 3rd party services consume our API gateway REST API to publish events. These events go straight to our central EventBridge bus using a direct service integration. This means other domain services (or 3rd parties if we use Web-Hooks) can consume the published events.

The benefits of the Event Gateway pattern are:

✔️ We can allow legacy or 3rd party systems in your eco-system to publish events to your central EventBridge bus (as long as they can make HTTPS calls that is)

✔️ This is a powerful pattern when combined with the Change Data Capture (CDC), Outbox, and Web-hook patterns (more on those later).

✔️ We can add authentication and authorisation to the API Gateway, as well as basic schema validation.

✔️ The publishing service can be written in any language, and can be both cloud based or on-premise, meaning we have huge flexibility with this approach.

What are the potential issues with this approach?

⭕ One issue is that we can really only use basic API Gateway request validation for the generic payload, which should be made up of metadata and data properties like we would with a typical EventBridge event. One way of working around this is having a Lambda function in between the API and the EventBridge bus, so we can use JSON schema validation based on the source and event name.

You can view a detailed article and code repository for this pattern below:

5. Call Me ‘Maybe’ Pattern

The next pattern we will look at is often referred to as ‘Web-hooks’ in the wider industry, but with the use of Amazon EventBridge API Destinations it is much, much more than that. There are different interpretations of the ‘Call me “maybe”’ pattern, but this is mine.

Amazon EventBridge API destinations are HTTP endpoints that you can invoke as the target of a rule, similar to how you invoke an AWS service or resource as a target.

Using API destinations, you can route events between AWS services, integrated software as a service (SaaS) applications, and your applications outside of AWS by using API calls.

“This essentially means when we combine this pattern with the ‘Event Gateway pattern’ we have a way of interacting two-way asynchronously between any systems (including legacy systems or COTS products)”

When you specify an API destination as the target of a rule, EventBridge invokes the HTTP endpoint for any event that “maybe” matches the event pattern specified in the rule and then delivers the event information with the request.

We can see from the diagram above that we have an event rule on the custom bus which has an API Destination as the target. This means based on a particular event we can target an external API or Web-hook, which is a very powerful pattern.

This essentially means when we combine this pattern with the ‘Event Gateway pattern’ we have a way of interacting two-way asynchronously between any systems (including legacy systems or COTS products).

The benefits of the Call Me ‘Maybe’ pattern are:

✔️ We can now invoke external APIs based on events within our AWS ecosystem, meaning when we combine this with the ‘Event Gateway Pattern’ we can have powerful two way asynchronous communication between almost any systems.

✔️ Amazon EventBridge API Destinations will automatically handle the retries and back-offs when the external downstream API is unavailable.

✔️ This is a seamless integration without the need of writing custom code.

✔️ This flexibility enables us to integrate with a wide range of services, both within and outside the AWS ecosystem.

✔️ You can configure a DLQ for the API Destination to capture events that repeatedly fail to be delivered. This helps you handle failure scenarios and ensures that events are not lost.

✔️ Authentication is handled seamlessly using ‘connections’, allowing for OAuth, API Key and Basic Auth flows.

What are the potential issues with this approach?

⭕ One of the biggest issues I have come across so far is that we can only invoke public APIs with API Destinations. This becomes an issue if we are using Private API Gateways; and there are currently no workarounds (even internal ALBs are not supported).

The following article has an associated code repo talking through EventBridge API Destinations in detail:

6. Change Data Capture (CDC) Pattern

Change Data Capture (CDC) in a serverless context refers to the process of capturing changes made to data in a database or data store (most commonly DynamoDB using DynamoDB streams), and then reacting to those changes.

As we are reacting to the changes of the committed data, we can then use the emitted data change to publish events, build materialised views, or to store the changes locally to further process. The latter is typically known as a ‘Transactional Outbox’ pattern which we will discuss in the next section.

As you can see from the diagram above, we utilise DynamoDB streams to inspect data changes within the DynamoDB table, and in this scenario, use a Lambda function to further process the inspected changes or to publish the events that something has changed.

The benefits of the Change Data Capture pattern are:

✔️ CDC enables real-time or near-real-time data processing by capturing and reacting to data changes as they happen. This ensures that the serverless functions (or other consumers) are triggered promptly whenever data is inserted, updated, or deleted, allowing for faster and more responsive application behavior.

✔️ CDC fits naturally into an event-driven architecture, where we publish events off the back of specific data changes.

What are the potential issues with this approach?

⭕ One key thing to understand with this approach is that it is eventually consistent and asynchronous in nature, and this needs thought about as a design consideration.

Where could we use this pattern?

✔️ A popular mobile app modifies data in a DynamoDB table, at the rate of thousands of updates per second. Another application captures and stores data about these updates, providing near-real-time usage metrics for the mobile app.

✔️ An application automatically sends notifications to the mobile devices of all friends in a group as soon as one friend uploads a new picture.

✔️ A new customer adds data to a DynamoDB table. This event invokes another application that sends a welcome email to the new customer.

7. Transactional Outbox Pattern

The Transactional Outbox Pattern is a design pattern used to maintain data consistency between services in a distributed system, which is typically an extension to the serverless “change data capture” pattern (but doesn’t need to be).

The primary goal of the Transactional Outbox Pattern is to address the following issue: when a service performs an operation that modifies its own database and needs to notify other services about the change, it should do so in a reliable and consistent manner.

However, directly calling external services during the same transaction can introduce coupling and increase the risk of failure or performance bottlenecks; as well as issues with retries on failure.

The Transactional Outbox pattern is shown below which mitigates these issues:

We can see from the diagram above that we utilise Amazon EventBridge Pipes to poll data from DynamoDB streams (i.e. changes to the data in the table) and we have an Amazon EventBridge bus as the ‘outbox’ for further processing of these ‘domain events’. We could also utilise a FIFO SQS queue instead:

We could also apply this patten without using Pipes as shown below, which also works when we are using a ‘single bus, multi-account’ pattern:

Note: We don’t need to use streams here of course, and could opt for a relational database which writes to two tables within a given database transaction (one of which is the outbox table), and poll for the changes.

In this example we utilise a Lambda function which reads from the DynamoDB stream and publishes events to the central EDA bus in a dedicated AWS account.

Using this approach with legacy systems

To note, this pattern is not just for Serverless services, and can be used in combination with on-premise systems too in a strangler fig fashion. In the diagram below, we add a trigger to an existing legacy database to populate an outbox table, and a outbox processor (perhaps a .NET service) reads the outbox for the next row, and calls a cloud service for new functionality.

The benefits of the Transactional Outbox pattern are:

✔️ When you have multiple services that need to update their own local data within a single transaction scope, the Transactional Outbox pattern ensures that all updates succeed eventually. This helps maintain data consistency across services.

✔️ The Transactional Outbox pattern allows you to handle failures gracefully. If an update to a downstream service fails, the message in the outbox can be retried later until the operation succeeds, ensuring data consistency is eventually achieved.

✔️ With the Transactional Outbox pattern, the actual processing of the events can be performed asynchronously, which can improve overall system responsiveness and performance.

✔️ The Transactional Outbox pattern is often used in event-driven architectures, where events represent facts that have occurred in the system. Storing the events in an outbox aligns well with event sourcing principles and can be used as a foundation for building event-driven systems.

What are the potential issues with this approach?

⭕ For die hard DDD (Domain-Driven Design) fans they may not be comfortable with raising the domain events through CDC and an Outbox pattern, as opposed to raising the events in the aggregates as part of a database transaction and writing to the main table and the outbox at the same time.

8. Saga Pattern

The Saga pattern is a design pattern commonly used in microservices architectures to manage distributed transactions and ensure data consistency across multiple services; which can come in two flavours:

  • Choreography (event-based) — each local transaction publishes domain events that trigger local transactions in other external domain services.
  • Orchestration (workflow based)— an orchestrator tells the participants what local transactions to execute and in what order.

In a microservices architecture, the main goal is to build decoupled and independent components to promote agility, flexibility, and faster time to market for your applications. As a result of decoupling, each microservice component has its own data persistence layer. In a distributed architecture, business transactions can span multiple microservices. Because these microservices cannot use a single atomicity, consistency, isolation, durability (ACID) transaction, you might end up with partial transactions. In this case, some control logic is needed to undo the transactions that have already been processed. The distributed saga pattern is typically used for this purpose. — https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/implement-the-serverless-saga-pattern-by-using-aws-step-functions.html

Choreography (event-based)

In the choreography-based Saga pattern, each domain service is responsible for handling its part of the transaction and emitting events to inform other external domain services about the state changes or actions it has performed.

When combined with Amazon EventBridge, each service knows how to initiate its part of the transaction based on events; and responds to events from other services to progress the transaction.

Services emit events to notify other external domain services about their actions, and other external domain services listen for these events to determine their own actions. These events are typically published onto a shared Amazon EventBridge bus, facilitating the communication between services (single bus, multi-account pattern). The simple diagram above shows:

  1. A customer creates a new order which is in draft status to start with.
  2. An ‘Order Created’ event is published onto the main shared event bus from the Order domain.
  3. The ‘Order Created’ event is routed from the shared event bus to the Customer domain event bus as a target through an event rule.
  4. The Customer domain service validates the Customer credit, and either publishes a ‘Customer Credit Verified’ or ‘Customer Credit Denied’ event based on its own domain logic.
  5. The Customer based events are routed from the main shared event bus to the Order domain event bus as a target based on an event rule.
  6. The draft order is either completed or cancelled in the Order domain service based on the Customer event.

“If a local transaction fails because it violates a business rule then the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions, all through publishing events.”

The benefits of the choreography saga pattern are:

✔️ This is the most decoupled way of creating distributed transactions across domains (bounded contexts).
✔️ This allows us to choreograph 1..n local orchestration sagas across many domains.

What are the potential issues with this approach?

⭕ For more advanced distributed transactions this can become fairly complex to organise across many domains.

⭕ It is harder to debug and visualise the flow of the saga as it is across multiple domains and based on events.

Orchestration (workflow based)

In the orchestration-based Saga pattern, a central orchestrator is responsible for coordinating the entire distributed transaction within a single bounded context. The orchestrator communicates with individual services within that domain, instructing them on the actions they need to perform as part of the transaction.

When combined with AWS Step Functions, the Saga pattern can provide several benefits for building scalable, reliable, and maintainable microservices applications.

We can see in the diagram above that:

  1. A customer can create a new order via Amazon API Gateway.
  2. There is a direct integration with a Workflow from API Gateway.
  3. The Place Order Lambda function creates a draft status order in the Orders DynamoDB Table.
  4. The Check Customer Credit Status Lambda function checks the customer credit status against a read only materialised view of customer data which is built based on events from the customer domain.
  5. If the customer credit status is valid we complete the order using the Complete Order function, and if it is invalid, we cancel the order using the Cancel Order Lambda function.
  6. We raise an Order Complete event onto the shared event bus for other domains.

Note — there are currently over 200+ direct service integrations that can be performed within a Step Function workflow, meaning we can actually greatly reduce the need for compute, for example, Lambda functions.

The benefits of the orchestration saga pattern are:

✔️ With the orchestrator explicitly controlling the flow of the transaction, the sequence of steps and their dependencies are clear and well-defined. This makes it easier to understand and manage the transaction’s logic and behavior.

✔️ The orchestration-based Saga pattern provides centralised visibility into the state and progress of the distributed transaction.

What are the potential issues with this approach?

⭕ Ideally we would only use these within a given bounded context, and not across multiple (except for when this is an experience layer if required, such as a website BFF where it needs to perform tasks syncronously). The reason for this is that it is common that a BFF will need to interact with multiple downstream systems across one API.

Composable Architecture Patterns 🏅

OK, so we have talked at a high level about the 8 different architecture patterns we can use in our serverless workloads, but now let’s look at a couple of composable architectures we can use for varying scenarios.

✔️ Communication with legacy or 3rd party systems

We can combine the ‘event gateway’ and ‘call me “maybe”’ patterns to seamlessly integrate two-way between any legacy or 3rd party systems using events as shown below:

In the combined patterns we can see that:

  1. An event target rule is an API Destinations request to an external application on the ‘OrderCreated’ event. This is the implementation of the ‘Call Me “Maybe”’ pattern.
  2. The external app processes the event request and publishes an ‘CustomerVerified’ event to the shared ESB through an API Gateway which sits in front of it. This is the implementation of the ‘Event Gateway’ pattern.
  3. An event rule targets the Orders domain bus for the
    CustomerVerified’ event, and the order is completed.

✔️ Workflow based on critical request payload

We can combine the ‘storage first’ and ‘saga’ patterns to create a powerful architectural approach where we persist a request instantly and then perform some detailed orchestration in an asynchronous manner:

In the combined patterns we can see that:

  1. Other services interact with our API Gateway on the Payslip domain service (auth removed for brevity).
  2. We create a direct service integration between API Gateway and an SQS queue, whereby we persist the payload instantly before processing with any compute. This is the ‘Storage First Pattern’.
  3. We use Amazon EventBridge Pipes to invoke a Step Function asynchronously to process the JSON payload of the payslip stored in the SQS message.
  4. We validate the payslip and perform any business logic. We then persist to a DynamoDB table.
  5. We then generated the PDF copy of the payslip and store in an Amazon S3 bucket. We also update the database with the file path to the PDF in the bucket.
  6. We generate an email to notify the customer of their new payslip.
  7. We publish a ‘PayslipPublished’ event to the main event bus.

Note — In this example we have removed any dead-letter queues and it only shows the happy path for brevity i.e. no distributed rollback.

✔️ Experience Layer BFF where downstream services raise events

We can combine the ‘api proxy’ ‘simple service’, ‘change data capture’ and ‘outbox’ patterns to create a BFF for a website which proxy’s downstream domain services which raise public domain events:

In this example we can see that:

  1. Our users interact with a Vue.JS web application which is hosted using S3 and CloudFront (AWS Services removed for brevity).
  2. The Vue.JS application utilises our AppSync GraphQL API (Backend-for-frontend API) which is a proxy to underlying domain services. This is a combination of the ‘API Proxy’ and ‘Simple Service’ patterns combined, as the BFF does store its own state in its own DynamoDB table, for example user specific information.
  3. The orders graphQL resolvers hit the orders API Gateway API.
  4. The Orders domain service has its own simple service pattern, which stores the new orders in a DynamoDB table.
  5. We use a CDC pattern to stream the database changes through DynamoDB streams to Amazon EventBridge Pipes.
  6. The pipe targets an SQS queue as an ‘Outbox’ pattern so we can persists the domain events for further processing.
  7. A Lambda function reads the domain events from the SQS queue and processes them.

Conclusion

In conclusion, mastering these design patterns opens up a world of possibilities for developers to build scalable, efficient, and cost-effective applications; that cover almost any use-case or requirement.

By leveraging these patterns, developers can harness the true power of Serverless and revolutionise the way they create and deploy applications in the modern digital landscape.

In Part 2, we will talk through the AWS CDK TypeScript code for each of the patterns, including the composed versions, and more of the fine-grained service limitations we need to be thinking of.

Wrapping up

I hope you enjoyed this article, and if you did then please feel free to share and feedback!

Please go and subscribe on my YouTube channel for similar content!

I would love to connect with you also on any of the following:

https://www.linkedin.com/in/lee-james-gilmore/
https://twitter.com/LeeJamesGilmore

If you enjoyed the posts please follow my profile Lee James Gilmore for further posts/series, and don’t forget to connect and say Hi 👋

Please also use the ‘clap’ feature at the bottom of the post if you enjoyed it! (You can clap more than once!!)

About me

Hi, I’m Lee, an AWS Community Builder, Blogger, AWS certified cloud architect and Global Serverless Architect based in the UK; currently working for City Electrical Factors (UK) & City Electric Supply (US), having worked primarily in full-stack JavaScript on AWS for the past 6 years.

I consider myself a serverless advocate with a love of all things AWS, innovation, software architecture and technology.

*** The information provided are my own personal views and I accept no responsibility on the use of the information. ***

You may also be interested in the following:

--

--

Global Head of Technology & Architecture | Serverless Advocate | Mentor | Blogger | AWS x 7 Certified 🚀