Faastlane Accelerating Function-as-a-Service Workflows
7 minute read ∼ Filed in : A paper noteIntroduction
Motivation
An application can be expressed as a workflow, which is composed of many functions. AWS Step Functions. IBM Action Sequences and OpenWhisk Composers enable developers to create and execute such workflows.
In contemporary FaaS platforms, even requests that belong to the same existing function, are executed on a separate container. As a result, functions can only communicate through network / shared storage.
However,
- The size of the direct communicable state across functions has a limitation ( 32kb in ASF )
- Large data must go through slow storage like S3,
Those 2 limitations bring the following problems:
- incur high interaction latency ( account for up to 95% of the execution time of a workflow instance on ASF and OpenWhisk)
- incur high costs, the user has to pay for execution time, state transfer time, and storage cost.
The paper tries to minimize the function interaction latency by executing functions of a workflow as threads within a single process of a container instance. For Python and NodeJs ( Thread Not allowed ), the paper uses a forked process.
In this way, functions could communicate through a shared virtual address space of the process. Cache-coherence ensures strong consistency.
However, threaded execution has two challenges:
-
Isolated execution for sensitive data:
Some data cannot be accessible for untrusted analytics functions from a library.
But threads under a process share the same address space. The paper used lightweight thread-level isolation of sensitive data with Inter Memory Protection Keys (MPK ). MPK allows a group of virtual memory pages (i.e., parts of process address space) to be assigned a specific protection key. Threads can have different protection keys and, thus, different access rights to the same region of the process’s address space.
With MPK, functions in a workflow instance, executing on separate threads, have different access rights to different parts of the address space. Simultaneously, it enables the efficient sharing of non-sensitive data by placing it in pages shared across the functions.
- MPK partitioned the virtual address space into 16 sets of pages.
- MPK uses a 32-bite register ( PKRU ) to specify access rights ( read/write ) for each page.
- Each key is mapped into 2 bits in the register.
- Each process can visit the page with a key and can have read/write access rights.
-
Concurrent function executions
Many functions in an instance execute in parallel.
But threaded execution is not suitable for concurrent execution when using Python or NodeJs.
The paper uses an adaptive workflow composer,
- runs the parallel job in the process ( communicates with Python pipes which use shared memory communication )
- run sequence job in threads in a container
- If hundreds and thousands of parallel functions come, the system will use the traditional method of running multiple containers.
Contributions
- Reduce function interaction latency over OpenWhisk by 2307 X with threads.
- Provides thread-level isolation using Intel MPK
- It leverages parallelism by adapting to execute functions as processes.
FaaS use cases
Cases
The paper propose 3 use cases in FaaS, and each of them has one property.
- FaaS can be used in validating trades against market data for about 200 pre-determined rules. It requires high parallelism.
- ML serving system has pre-processing => prediction, which is sequence pattern.
- Healthcare analytics cases show functions are conditionally executed based on the input and output of the previous function.
Threat Model
Even functions belonging to the same application should not access each other’s state.
But most applications use third-party repositories along with trusted functions and services that can access sensitive data.
The paper uses MPK to protect the data, but cannot protect the code logic
Design & Implementation
Design
-
minimize interaction latency without sacrificing concurrency in parallel.
-
Strive to run functions as threads/processes in a single container.
-
The composer periodically (e.g., once a day) profiles containers on the FaaS platform to ascertain the available parallelism.
It deploys a simple compute-bound micro-benchmark on the FaaS platform and observes its scalability to infer the number vCPUs in the container deployed by the platform.
-
-
Data sharing and function isolation
Faastlane uses Intel MPK to provide thread-granularity memory isolation for FaaS functions that share a virtual address space
-
Easy to use, and no new API.
The paper designs a static client-side tool, the workflow composer
Implementation
Isolation
- When a process starts, all pages in its address space are assigned ( by system call ) a default protection key, and the PKRU value allows both read and write access ( by WRPKRU instruction which allows the process to modify the value of register from user-space ) to all pages. And the protection key is written into the page-table entry.
- WRPKRU can be called from user-space, so it must be run in trusted code ( thread-gate )
- A thread gate is a sequence of instructions that contains MPK-specific instructions to modify the PKRU register or to assign protection keys to pages. The system has two gates.
- Entry gate: Attaches a protection key to a thread, Informs the key into the memory manager such that the memory manager can ensure all subsequent memory requests are satisfied from the right pages and update the register information to ensure this thread has r.w rights.
- Exit gate: Free the key for further use, zeros-out the memory region,
- the parent thread maintains a shared heap is accessible to all threads and serves as the shared memory region to which all threads enjoy unfettered read/write access.
Thread memory manager
It modifies python’s memory manager to ensure memory requests from one thread are always mapped to the thread’s private arenas
Evaluation
With 4 applications, the paper
- Measures the improvement of function interaction latency, end-to-end latency, throughput, and dollar cost. in one server.
- Measure scalability on many servers, and containers.
Compare with ASF ( run on AWS cloud ), Open-Whisk, and SAND. ( run on local hardware platform ).
Function Interaction Latency
Run each application at least 100 times and report the median.
SAND’s hierarchical messaging queues serialize communication from concurrently executing functions.
For ML prediction, since the data is large, ASF and OW transfer them with slow storage.
End2End latency
The end-to-end execution latency of an application’s workflow is the time that elapses between the start of the first function of the workflow and the completion of the final function in the workflow. It doesn’t include the initialization time.
Execute each application at least 100 times and report both the median and tail (99%-tail) values
End2End latency = external service request + compute time + interaction latency
Throughput
Measure the number of application requests serviced per minute on different FaaS platforms
Throughput measurement also accounts for initialization ( spawning containers ) and post-processing time
Cost of Isolation.
The experiments show that the cost of MPK-based isolation is reasonable. ( with 1.9-14.9% in the end-to-end latency in Faastlane)
Money Cost
The system has zero cost of transitioning data. It has only lambda cost.
Scalability
Scaled up the number of functions in the parallel. varied the number of vCPUs in each container from 4 to 50.
OW BaseLine: run each function in a separate container.
Increasing vCPU / container, Faastlane launches fewer containers => at high speedup.