Monitoring AWS Resources on Your Sandbox and Demo Accounts Using Lambda, EventBridge, and MS Teams

3/30/20236 views • 5 min read • 0 likes

Monitoring AWS Resources on Your Sandbox and Demo Accounts Using Lambda, EventBridge, and MS Teams

The problem

We all have done it.

We all launched an EC2 instance or created another pricey AWS resource for testing or demo purposes and forgot to delete it when we no longer needed it. Often You only notice this when the monthly invoice arrives.

That’s why monitoring all Your resources is important. However, on multiple AWS accounts, this can become tedious. Especially if You have sandbox or demo accounts, where different people regularly create new resources, it can happen fast that You lose track.

The solution

So to prevent unpleasant surprises, we will implement a system, that will keep track of Your resources for You.

The solution consists of four parts:

  1. Introduce a tagging convention to record who is responsible for a resource and on which date You should delete it

  2. Implement a Lambda function to gather all that information and determine which resources require attention

  3. Create an EventBridge schedule to trigger Your Lambda function at regular intervals

  4. Add an MS Teams channel to receive the information and display it to You and the other team members

The tagging convention

Start by introducing a tagging convention (and get Your colleagues to follow it).

To keep it simple You will only need 2–4 tags per resource.

  • <your custom prefix>:owner:name contains the name of the person responsible for this resource

  • <your custom prefix>:owner:email contains the email address of the person responsible for this resource

  • <your custom prefix>:lifetime:restricted contains a boolean value and tells You whether this resource can stay for an infinite amount of time or if You have to delete it at some point

  • <your custom prefix>:lifetime:end contains an optional date value (mm/dd/yyyy) and tells You when to delete this resource

For more information on tagging refer to the AWS whitepaper Tagging Best Practices.

The Lambda function

Now that You have a tagging convention, set up a Lambda function to use those tags for automation.

Here is the code I used:

import os
import json
import logging

from dataclasses import dataclass
from datetime import datetime
from typing import Any

import requests
import boto3

logger = logging.getLogger()

if 'AWS_EXECUTION_ENV' in os.environ and 'LOG_LVL' in os.environ:
    LOG_LVL = os.environ['LOG_LVL']
    logger.setLevel(level=LOG_LVL)
else:
    # for local debugging
    logger.setLevel(level='DEBUG')

if 'TAG_PREFIX' in os.environ:
    TAG_PREFIX = os.environ['TAG_PREFIX'] + ':'
else:
    TAG_PREFIX = ''


@dataclass
class BotoClients:
    tagging: Any = boto3.client('resourcegroupstaggingapi')


clients = BotoClients()


def init_clients():
    global clients
    if not clients:
        clients = BotoClients()


def main(event, context):
    # get all resources with restricted lifetime
    resources = []
    tag_filters = [
        {
            'Key': f'{TAG_PREFIX}lifetime:restricted',
            'Values': [
                'true',
            ]
        },
    ]

    # do
    response = clients.tagging.get_resources(
        TagFilters=tag_filters
    )
    pagination_token = response['PaginationToken']
    resources = resources + response['ResourceTagMappingList']
    # while
    while pagination_token != '':
        response = clients.tagging.get_resources(
            TagFilters=tag_filters
        )
        pagination_token = response['PaginationToken']
        resources = resources + response['ResourceTagMappingList']

    logging.debug('### resources with limited lifetime')
    logging.debug(resources)

    # filter critical resources
    critical = []
    for resource in resources:
        lifetime_end = datetime.now()
        present = datetime.now()
        for tag in resource['Tags']:
            if tag['Key'] == 'power:lifetime:end':
                lifetime_end = datetime.strptime(tag['Value'], '%m/%d/%Y')

        if lifetime_end <= present:
            critical.append(resource)

    logging.debug('### resources that exceeded lifetime')
    logging.debug(resources)

    if len(resources) <= 0:
        logger.debug('No resources that exceed lifetime')
        return

    # format resources
    message_text = ''
    for resource in critical:
        message_text += resource_to_string(resource) + '\n'

    # send to sns
    url = os.getenv('WEBHOOK_URL')
    message = {
        'text': message_text
    }
    response = requests.post(url, json=message)
    logger.debug('### teams response')
    logger.debug(response.status_code)
    logger.debug(response.json())


def resource_to_string(resource):
    result = resource['ResourceARN']
    for tag in resource['Tags']:
        if tag['Key'] == f'{TAG_PREFIX}owner:name':
            result = f"{result} - owned by {tag['Value']}"
        if tag['Key'] == f'{TAG_PREFIX}lifetime:end':
            result = f"{result} - lifetime ended on {tag['Value']}"
    return result


def lambda_handler(event, context):
    logger.debug('## ENVIRONMENT VARIABLES')
    logger.debug(json.dumps(dict(**os.environ), indent=4))
    logger.debug('## EVENT')
    logger.debug(json.dumps(event, indent=4))

    try:
        logger.debug('## MAIN FUNCTION START')
        init_clients()
        main(event, context)
    except Exception as e:
        logger.error(f'Main function raised exception {type(e).__name__} because {e}.')
        logger.exception(e)
        raise e
    logger.debug('## MAIN FUNCTION END')
    return 'finished'

You have to change a few things in the configuration:

  • Your Lambda function will need a Lambda layer containing the Python Requests library. Check out the documentation to learn more about layers.

  • It needs permission to access the Amazon Resource Group Tagging API. You have to allow it to use the tag:getResources action on all Your accounts resources.

  • You have to add an environmental variable containing Your MS Teams webhook URL (more on this later). It is called WEBHOOK_URL.

  • There is another variable You can add to support a custom prefix. It is called TAG_PREFIX.

The EventBridge schedule

Go to EventBridge and create a new schedule.

You can create a recurring schedule by entering a Cron expression. A valid Cron expression is 0 13 ? * 2–6 *. This expression triggers Your function every workday at one o’clock. Next, go and select Your Lambda function as a target and create a new role for it. For the other settings, You can keep the default configuration.

The MS Teams integration

Now, go to the Teams channel You want to write the messages to. Create a new incoming webhook. Teams will then give You a URL. This is the URL You have to set as an environmental variable in the Lambda function. You can also set a fitting icon for the webhooks Teams user.

Limitation

This will only work if You and Your team members set the tags. Also, this will only show You resources in the region where You deployed the function. If You use multiple regions, You’ll also have to deploy the function in multiple regions.


If You are interested in more content on AWS You may want to check out my last Medium story about Granted:

https://krimphove.site/blog/how-to-use-granted-to-log-in-to-multiple-aws-accounts-at-the-same-time