AWS Lambdas Auto Scaling And Runtime Explained

AWS Lambda actually uses plain old Linux containers to run code. They're instantiated for every Lambda function on an as-needed basis.

In fact, they're used for AWS EC2 and other AWS services as well. AWS actually released a publicly available Docker container that creates this environment. Thanks to this move, code can be run and tested in the same environment locally as the one provisioned by AWS, faster and in a more reliable sandbox before it's deployed to the AWS cloud as production code. And if sometime in the future a customer decides to move away from Lambda and use Docker Swarm, AWS EC2 or similar instead, it should be much easier to do so because the runtime environment can stay the same. This removes some of the vendor lock-in when using AWS.

Whenever an AWS Lambda function is requested, AWS will first check if there are any "hot" and available instances already created, waiting to be used. If there is one, AWS will send the request to it instead of creating another instance of the same function. When the same function is needed multiple times within a few minutes between separate requests and the function executes for shorter than that, reuse and scalability will be high. This mechanism allows Lambdas to scale really well out of the box. Because instances are kept alive for a while, they can also benefit from in-memory caching. There's no charge for idling functions and AWS takes care of instantiating the right number of them based on load. It's also possible to warm up functions on a schedule using AWS's CloudWatch events, to ensure there's always an idling, "hot" Lambda instance waiting for a job. But even a "cold" start usually takes just a few seconds at most and mostly because of the code customers run on it and not the Linux system itself. The "hot" state is retained by AWS for about 45 minutes (it's not officially documented and may change) after last execution.