The MADlib Analytics Library or MAD Skills, the SQL
1 minute read ∼ Filed in : A paper noteSummary
This paper proposes a way to integrate the MCMC and Gibbs sampling algorithm using SQL language. Further, it analyzes those models with some characterizations and proposes a rule to choose the different algorithms for different documents in a single query.
Introduction
This paper introduces Madlib, a library of analytics that can be installed and executed within a relational database engine that supports SQL.
Architecture
There are two challenges in architectural design:
- In macro: How to partition data into chunks that can fit in single node memory and can be orchestrated in movement between nodes.
- In micro: How to invoke efficient linear algebra routines on the data it got.
Programming model
Macro program:
It provides UDA to do the aggregation like map-reduce. It also inloves the chunk level aggregation.
Micro program:
It invokes single-node code to perform arithmetic, such as dense matrix operations, on single-node chunks.
The paper writes its own sparse matrix library since the standard math library doesn’t handle math lib well.
Implementation
It implements an abstract layer that provides three functionality.
-
Type bridging
The mapping of C++ types and methods => database types and methods.
-
Resource management shims
The abstract layer’s the memory allocation/deallocation, exception handling, and system signal passing.
-
Math library integration
The abstract layer incorporating third-party libs makes writing correct and performant code easy.