SAND Towards High-Performance Serverless Computing

Posted on June 11, 2022   4 minute read ∼ Filed in  : 

Introduction

Motivation

In serverless computing, the unit of computation is a function. When a service request is received, the serverless platform allocates an ephemeral execution environment for the associated function to handle the request.

For more complex applications, an Action sequence is used to connect multiple functions into a single service.

Overall, serverless can separate the function into modules, each function should start up as quickly as possible.

But there are high start-up delays for function execution and inefficient resource usage.

image-20220613184210071

There are two reasons

  1. Each application is executed in a separate container instance.
    • Cold-start: High start-up latency, it starts a container, and installs libraries ( 80% taken). ( few seconds in AWS Lambda. )
    • Warm-start:
      • It uses a pre-warming technique to speed up the first call, but it occupies resources for the idle period.
      • It relaxes the original isolation guarantee, since many requests may be handled inside the same container.
  2. Internal function call also passes from the controller, which incurs extra latency.

Contribution

The paper presents a system to provide:

  1. Lower latency: Function runs as a process inside the same container. (fast allocate/delete)
  2. High resource efficiency: Process takes less memory compared with a new container.
  3. More elasticity: Shared message bus at each host for function communication. (fast triggering of functions running on the same host)

The idea is

  1. Two levels of isolation to enable quickly allocate/release of resources: between applications / between functions of the same application.
  2. Use hierarchical message queuing and storage mechanism to leverage locality for interacting functions of the same application.
    • Functions in the same application are orchestrated locally.
    • Use a local message bus on each host, such that functions executing in a sequence can start instantly.

Experiments show it achieves 43% speed-up compared with OpenWhisk.

Limitations

  1. Functions run in the same container at the same host may compete for the same resources and interfere with each other’s performance. And the scheduling policy could lead to sub-optimal load balancing.
  2. In this system, the CPU time could be used for billing purposes. But the process contention could increase the latency of the application. And the function cannot get a fair share of resources.
  3. The system use fork to run a new function, so it cannot support language runtime without native forking.

Background

Functions in serverless

Deploy: Functions are mapped into containers to achieve the goal.

Call: When the cloud receives users’ requests, it can cold start, or warm start to launch a function ( with container ) to run users’ requests.

Concurrency: Cloud providers allow only one execution at a time in a container for performance isolation ( run a new container in cold start, wait in a queue in warm start ). While OpenFaaS allow concurrent executions of the same function in a single container.

Chaining: Events that trigger function can be external or internal, Most existing systems treat those events the same.

System Design

image-20220613211120475

image-20220613211135593

image-20220613211147247

Evaluation

The paper evaluates SAND with OpenWhisk and AWS Greengrass.

image-20220613212359956

Start latency

spawning processes with binaries (exec C, exec Go) are faster than interpreted languages (exec Python, exec NodeJS) and Java, and forking processes (fork C, fork Python) is fastest among all。

Hierarchical Message pass

Local message bus on every host for fast function interaction, local message bus 2.90 faster than via the global message bus.

Function interaction latency

Measure the time between the first function’s finish and the second function’s start. All functions communicate with the local host’s shared memory. faster 9 ms.

Memory Usage

50 concurrent calls to a single Python function.

Each call adds to the memory footprint of about 14.61MB and 13.96MB in OpenWhisk and Greengrass, respectively.

In SAND, each call only adds 1.1MB on top of the 16.35MB consumed by the grain worker.

This difference is because SAND forks a new process inside the same sandbox for each function call, whereas OpenWhisk and Greengrass use separate containers for concurrent calls.

Idle Memory Cost vs latency

The container will keep warm for a while if there are requests that come until timeout.

image-20220613221335876

Cases

image-20220613220624696





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.