Monitoring AWS CloudWatch Logs with CDK: Creating Alarms for Specific Error Conditions

An example of creating AWS CloudWatch alarms on filtered CloudWatch logs using the AWS CDK and TypeScript, with full code repository to support the article.

9 min readJun 24, 2023

Introduction

In the world of cloud computing, effective monitoring and alerting are crucial for maintaining the health and stability of your serverless applications and services. AWS CloudWatch is a powerful monitoring and observability service provided by AWS which we are going to use in this article.

One essential aspect of monitoring is keeping a close eye on logs generated by your applications and services. CloudWatch Logs allows you to gain insights from log data generated by AWS resources and your own applications.

In this blog post, we will explore how to leverage the AWS CDK to create CloudWatch Alarms based on specific error conditions in CloudWatch Logs. We will walk through the process of configuring the necessary resources such as Log Groups and Metric Filters, and setting up CloudWatch Alarms to trigger actions when these error conditions are detected.

By the end of this post, you will have a clear understanding of how to harness the power of CDK to implement proactive monitoring and receive timely alerts for critical error scenarios in your applications and services.

Let’s dive in and learn how to enhance your monitoring capabilities with CloudWatch Logs and CDK.

You can view the full code here:

GitHub - leegilmorecode/serverless-detailed-alarms: An example of creating a CloudWatch Alarm based…

An example of creating a CloudWatch Alarm based off detailed properties in CloudWatch logs using filters - GitHub …

github.com

What are we building?

We are going to build a basic serverless solution for our fictitious company ‘Gilmore Candles’ who supply industrial grade candles in mass to large companies.

As part of this solution, 3rd parties can raise purchase orders to a max value through our website, but if they try to order quantities over that max threshold we need to be alerted. Let’s look at the architecture below:

We can see that:

Customers place large orders of candles where they state the quantity, product ID and price.
We utilise Amazon API Gateway as a way of the customers placing the orders.
We have a Lambda function which performs the check to see if the value is over our hard coded $100.00 limit per order. If it is, then we throw a ‘OverAgreedLimit’ error which can be tracked in our logs.
Successful orders placed under the limit are stored in Amazon DynamoDB.
All logs from the Lambda function are pushed to Amazon CloudWatch.
We utilise CloudWatch log insights to create metrics based on querying our logs for a status code of 400 and the error of ‘OverAgreedLimit’.
We raise a CloudWatch Alarm when the insights show that we have one of these errors or more in a given 5 minute period.
Any alarms push messages to our SNS Topic.
We have a subscription to the SNS Topic which emails our engineering team to alert them of issues.

Now let’s talk through the code!

👇 Before we go any further — please connect with me on LinkedIn for future blog posts and Serverless news https://www.linkedin.com/in/lee-james-gilmore/

Talking through the key code 🎤

Now let’s talk through the key code to make this happen!

We first have some constants at the top which are used for our metrics namespace and our service name which will be used to create our CloudWatch Log Metrics, as well as the email address that we want alarm alerts to go to.

We also have a Lambda function which creates a new order, and has the service name and metrics namespace passed in for Lambda Powertools.

// constants for now since this is just a demo
const serviceName = 'OrderService';
const metricNamespace = 'GilmoreCandles';
const emailAddress = 'your.email@email.com';

// create order lambda handler
const createOrderLambda: nodeLambda.NodejsFunction =
  new nodeLambda.NodejsFunction(this, 'CreateOrderLambda', {
    runtime: lambda.Runtime.NODEJS_18_X,
    entry: path.join(
      __dirname,
      '../stateless/src/adapters/primary/create-order/create-order.adapter.ts'
    ),
    memorySize: 1024,
    handler: 'handler',
    bundling: {
      minify: true,
    },
    environment: {
      TABLE_NAME: props.table.tableName,
      POWERTOOLS_SERVICE_NAME: serviceName,
      POWERTOOLS_METRICS_NAMESPACE: metricNamespace,
    },
  });
createOrderLambda.applyRemovalPolicy(cdk.RemovalPolicy.DESTROY);

// allow the lambda to write to the table
props.table.grantWriteData(createOrderLambda);

If we look at the Lambda function use case itself we can see that we perform a check that the over all order being placed is not over $100.00, and if it is we throw an ‘OverAgreedLimitError’:

import { getISOString, logger, schemaValidator } from '@shared/index';

import { CreateOrderDto } from '@dto/create-order';
import { OrderDto } from '@dto/order';
import { OverAgreedLimitError } from '@errors/over-agreed-limit-error';
import { createOrder } from '@adapters/secondary/database-adapter';
import { schema } from '@schemas/order';
import { v4 as uuid } from 'uuid';

// primary adapter --> (use case) --> secondary adapter(s)
export async function createOrderUseCase(
  createOrderDto: CreateOrderDto
): Promise<OrderDto> {
  const createdDate = getISOString();

  const newOrderDto: OrderDto = {
    id: uuid(),
    created: createdDate,
    ...createOrderDto,
  };

  // this is our check that the price * quantity is not over our
  // made up thresholds for this article - static @ 100.00

  if (newOrderDto.price * newOrderDto.quantity > 100.0)
    throw new OverAgreedLimitError('over agreed threshold');

  schemaValidator(schema, newOrderDto);

  const createdOrder = await createOrder(newOrderDto);

  logger.info(`order saved`);

  return createdOrder;
}

In the Lambda handler we have utilised a shared errorHandler which ensures that we check for the known errors that we are throwing, and ensuring that the log is raised containing the error message, the error name and the status code which allows us to then search for these values in the CloudWatch logs:

import { APIGatewayProxyResult } from 'aws-lambda';
import { logger } from '@shared/logger';

// we would typically use middy - but to keep this simple to read
// without mutliple additional packages lets build outselves
export function errorHandler(error: Error | unknown): APIGatewayProxyResult {
  console.error(error);

  let errorMessage: string;
  let statusCode: number;

  if (error instanceof Error) {
    switch (error.name) {
      case 'OverAgreedLimit': // note: this is our error type we want to alert on
      case 'ValidationError':
        errorMessage = error.message;
        statusCode = 400;
        break;
      case 'ResourceNotFound':
        errorMessage = error.message;
        statusCode = 404;
        break;
      default:
        errorMessage = 'An error has occurred';
        statusCode = 500;
        break;
    }
    logger.error(errorMessage, {
      errorName: error.name, // these additional props in the logs allow us to filter them
      statusCode,
    });
  } else {
    errorMessage = 'An error has occurred';
    statusCode = 500;

    logger.error(errorMessage, {
      errorName: 'UnknownError',
      statusCode,
    });
  }

  return {
    statusCode: statusCode,
    body: JSON.stringify({
      message: errorMessage,
    }),
  };
}

We then create our basic API Gateway which has one POST method on the /orders/ resource, and the Lambda function above is integrated with it:

// create the rest api
const api: apigw.RestApi = new apigw.RestApi(this, 'CandlesApi', {
  description: 'An API for purchasing candles',
  endpointTypes: [apigw.EndpointType.REGIONAL],
  deploy: true,
  deployOptions: {
    stageName: 'api',
    loggingLevel: apigw.MethodLoggingLevel.INFO,
  },
});

// add our prod service resources
const apiRoot: apigw.Resource = api.root.addResource('v1');
const ordersResource: apigw.Resource = apiRoot.addResource('orders');

// add the lambda proxy integration to the api resource (post on orders)
ordersResource.addMethod(
  'POST',
  new apigw.LambdaIntegration(createOrderLambda, {
    proxy: true,
  })
);

Now the interesting part! We create a CloudWatch Logs filter for the CloudWatch logs emitted from this Lambda function, whereby we check for the statusCode of 400 and errorName of ‘OverAgreedLimit’ in the logs (for the specific namespace agreed above in the constants):

// Create the Metric Filter for the lambda function logs specifically
// i.e. for status code 400 and error type of 'OverAgreedLimit'
const metricFilter = createOrderLambda.logGroup.addMetricFilter(
  'OverAgreedLimitErrorFilter',
  {
    filterPattern: logs.FilterPattern.literal(
      '{ $.statusCode = 400 && $.errorName = "OverAgreedLimit" }'
    ),
    metricName: 'OverAgreedLimitErrorMetric',
    metricNamespace: metricNamespace,
  }
);

We now need to create a CloudWatch Alarm which can utilise this metrics filter above:

// Create the CloudWatch Alarm based on the metric filter above
const alarm = new cloudwatch.Alarm(this, 'CloudWatchAlarm', {
  alarmName: 'OverAgreedLimitErrorAlarm',
  alarmDescription: 'Error 400 with OverAgreedLimit Error',
  metric: metricFilter.metric(),
  threshold: 1,
  comparisonOperator:
    cloudwatch.ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
  evaluationPeriods: 1,
  treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
});

// create our sns topic for our alarm
const topic = new sns.Topic(this, 'AlarmTopic', {
  displayName: 'OverAgreedLimitErrorAlarmTopic',
  topicName: 'OverAgreedLimitErrorAlarmTopic',
});

We can see that the alarm will trigger when our metric is equal to or higher than the threshold of 1 during a default period of 5 minutes. We also create an SNS Topic specifically for this alarm.

With the SNS Topic and alarm in place, we can now add the topic as an alarm action, and create an email subscription to the topic for the email address in the constants at the top of the file.

// Add an action for the alarm which sends to our sns topic
alarm.addAlarmAction(new SnsAction(topic));

// send an email when a message drops into the topic
topic.addSubscription(new snsSubs.EmailSubscription(emailAddress));

This now means that whenever the alarm is breached we will get an email alerting us of this.

Seeing this in action!

OK, now we have talked through the key code, let’s see this in action!

In the code repo we can utilise the Postman collection to start testing our solution (postman/Candles API.postman_collection.json)

Causing errors in our solution using Postman

We can see that we have caused a 400 error on our POST request as our quantity of 24564 and price of $1.20 equates too much more than the agreed limit the single person can purchase of $100.00in one go.

If we now look in the CloudWatch logs we will see that we can see these errors being raised:

This allows us to setup our CloudWatch logs metrics filter for the specific filter pattern in the Lambda function logs of:

{ $.statusCode = 400 && $.errorName = "OverAgreedLimit" }

Our metrics filter is created using this pattern as shown below (which is attached to our alarm):

An example of our CloudWatch Logs Metrics Filter and Associated alarm

We can see that this triggers our CloudWatch Alarm as it is setup to alarm if we have one or more of these types of errors in our logs in a default period of 5 minutes:

An example of our alarm based on CloudWatch metrics filters

Wrapping Up

By implementing the techniques described in this article, you will be able to establish a robust monitoring system that proactively notifies you about specific error conditions, allowing you to take timely actions and ensure the smooth operation of your applications and services.

Remember, monitoring is an ongoing process that requires continuous refinement and adaptation to changing requirements. Regularly review your monitoring setup, evaluate the effectiveness of your alarms, and fine-tune as necessary to ensure optimal performance and reliability.

Please go and subscribe on my YouTube channel for similar content!

I would love to connect with you also on any of the following:

https://www.linkedin.com/in/lee-james-gilmore/
https://twitter.com/LeeJamesGilmore

If you enjoyed the posts please follow my profile Lee James Gilmore for further posts/series, and don’t forget to connect and say Hi 👋

Please also use the ‘clap’ feature at the bottom of the post if you enjoyed it! (You can clap more than once!!)

About me

“Hi, I’m Lee, an AWS Community Builder, Blogger, AWS certified cloud architect and Global Serverless Architect based in the UK; currently working for City Electrical Factors (UK) & City Electric Supply (US), having worked primarily in full-stack JavaScript on AWS for the past 6 years.

I consider myself a serverless advocate with a love of all things AWS, innovation, software architecture and technology.”

*** The information provided are my own personal views and I accept no responsibility on the use of the information. ***

You may also be interested in the following:

Serverless Content 🚀

An index of all of my Serverless content to easily browse in one place, including videos, blog posts and more..

blog.serverlessadvocate.com