The MADlib Analytics Library or MAD Skills, the SQL

Posted on March 1, 2023   1 minute read ∼ Filed in  : 

Summary

This paper proposes a way to integrate the MCMC and Gibbs sampling algorithm using SQL language. Further, it analyzes those models with some characterizations and proposes a rule to choose the different algorithms for different documents in a single query.

Introduction

This paper introduces Madlib, a library of analytics that can be installed and executed within a relational database engine that supports SQL.

Architecture

There are two challenges in architectural design:

  • In macro: How to partition data into chunks that can fit in single node memory and can be orchestrated in movement between nodes.
  • In micro: How to invoke efficient linear algebra routines on the data it got.

Programming model

Macro program:

It provides UDA to do the aggregation like map-reduce. It also inloves the chunk level aggregation.

Micro program:

It invokes single-node code to perform arithmetic, such as dense matrix operations, on single-node chunks.

The paper writes its own sparse matrix library since the standard math library doesn’t handle math lib well.

Implementation

It implements an abstract layer that provides three functionality.

  • Type bridging

    The mapping of C++ types and methods => database types and methods.

  • Resource management shims

    The abstract layer’s the memory allocation/deallocation, exception handling, and system signal passing.

  • Math library integration

    The abstract layer incorporating third-party libs makes writing correct and performant code easy.





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.