Spangle A Distributed In-Memory Processing System for Large-Scale Arrays

Posted on May 26, 2023   1 minute read ∼ Filed in  : 

Summary

the paper mainly use bitmask to handle the null values and optimize the computation.

  • Reduce the storage size.
  • Reduce the query latency.

Introduction

Background/motivation

RDD: a distributed collection of objects partitioned acorss a cluster.

  • Operators: transformation (map, filter), action (reduce, collect)
  • Lazy execution: only action trigger the computation,
  • Fault tolerant: each stage form a pipeline and the results are persistent in memory/disk, which enable failure tolerant.

There is a strong need for algorithms dealing with large-scale matrices that do not fit into a single machine memory.

  • tasks could be conduct in distributed way.
  • scalable and efficient programming model.

Goal

The paper present array processing system.

  • It extends RDD to ArrayRDD, which can represent multi-dimensional matrics, arrays and spatio-temporal data.
  • It provides declarative interface to manipulate arrays.
  • it uses bitmask to represent the null value
  • It uses compress to compress the data.

Data Model-ArrayRDD

image-20230613115750150

ArrayRDD is corresponding to one attribute of data and can be split into mulitple chunks by size. Each chunk can be stored in three way: dense, sparse, super-sparse. For sparse, super-sparse, it reduce the storage cost and also reduce the look up complexity.

  • For sparse and super-sparse: it uses bitmask to handle the null values. In procesisng, it firstly scan bitmask, and counting ones in bitmask from the beginning to a given position to derive the original position in a chunk
  • For super-sparse chunk, the bitmask could larger then paylod, it then uses hierachical bitmask model to reduce the bitmask size.

image-20230613121146468

Programming Interface

It supports multiple operations such as subarray, filter, join, matrix operators. Each of them are optimized using the bitmask.





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.