How We Built RDP-Aware Autoscaling on AWS (When CPU Metrics Weren’t Enough)

CPU was not the problem. People were. We needed autoscaling based on RDP sessions, not load. Here is how a small Lambda, some EventBridge glue, and scale-in protection let us shut down the right EC2 host without kicking anyone off.

How We Built RDP-Aware Autoscaling on AWS (When CPU Metrics Weren’t Enough)
Photo by Buddy AN / Unsplash

A practical walkthrough of using Lambda, EventBridge, and scale-in protection to control exactly which EC2 hosts get terminated and slash costs without kicking users offline.

EC2 autoscaling is great… until it isn’t. The built-in metrics (CPU, memory) don’t always map to what you’re actually trying to scale. In our case, the metric that mattered wasn’t CPU load: it was “how many people are logged in through RDP?” (Yes, we’re hosting an RDP-based app. Don’t judge me.)

We run a fleet of Windows session hosts. They’re easy enough to scale out: boot an EC2 instance, wait for a specific service to come online, register it into the sessionhost pool through an API, done. The problem was the opposite direction. Nights, weekends… nobody connected, but machines kept spinning. My manager looked at the bill, turned to me, and went: “Kill the idle ones.”

AWS gives you scaling policies out of the box, but none of them say “scale in when zero humans are RDP’d into a box”. So I asked our resident PowerShell wizard for a quick script that reports active sessions. They delivered. That gave us the signal, now we just needed to automate the reaction.

We built a Lambda that runs every hour via EventBridge. It queries each host in the Auto Scaling Group, checks the session count, and if it finds a node with zero users, it:

  1. Deregisters it from our sessionhost pool (via the app API).
  2. Shuts it down → but only after convincing the ASG to remove the right instance.

This is where AWS gets slightly in the way. You can’t just say “terminate this instance” inside an ASG. If you reduce desired capacity by 1, AWS picks whatever instance it feels like. Including one that might still have users on it. Not ideal.

So we went with a controlled workaround:

The trick: temporary scale-in protection

When instances launch, our automation immediately enables scale-in protection, making them immune to ASG-driven terminations. That gives us full control. Then, when Lambda identifies an idle node, it simply:

  • disables scale-in protection for that instance only,
  • decrements desired capacity by 1, and
  • AWS goes: “Welp… that’s the only terminable one,” and removes exactly the instance we wanted.

It’s simple, reliable, and avoids booting users out of active sessions.

def lambda_handler(event, context):
    try:
        # Discover instances + their session counts
        instances = get_asg_instances("MY_SESSIONHOST_ASG")

        for instance_id in instances:
            session_count = get_session_count(instance_id)

            if session_count == 0:
                logger.info(f"{instance_id} has 0 active sessions — scaling in.")

                # Step 1: remove from application pool
                deregister_from_app_pool(instance_id)

                # Step 2: allow ASG to terminate this instance
                autoscaling.set_instance_protection(
                    AutoScalingGroupName="MY_SESSIONHOST_ASG",
                    InstanceIds=[instance_id],
                    ProtectedFromScaleIn=False
                )

                time.sleep(5)  # small buffer for AWS to register the change

                # Step 3: reduce desired capacity by exactly one
                group = autoscaling.describe_auto_scaling_groups(
                    AutoScalingGroupNames=["MY_SESSIONHOST_ASG"]
                )["AutoScalingGroups"][0]

                current = group["DesiredCapacity"]
                minimum = group["MinSize"]
                new_capacity = max(minimum, current - 1)

                autoscaling.update_auto_scaling_group(
                    AutoScalingGroupName="MY_SESSIONHOST_ASG",
                    DesiredCapacity=new_capacity
                )

                logger.info(
                    f"Decreased desired capacity: {current} → {new_capacity}"
                )
            else:
                logger.info(f"{instance_id} has {session_count} sessions — skipping.")

        return {"status": "completed"}

    except Exception as e:
        logger.error(f"Unhandled exception: {e}")
        logger.debug(traceback.format_exc())
        return {"status": "error", "message": str(e)}

It’s not fancy. It’s not “cloud-native magic”. But it’s predictable, keeps users happy, and most importantly: keeps us from paying for a bunch of idle Windows servers just because AWS didn’t think “RDP session count” would make a good scaling metric.

This is one of those little pieces of glue code that quietly saves money while making your infrastructure behave exactly the way you need it to and not the way some AWS product manager assumed you would.