Vertica ML Distributed Machine Learning in Vertica Database

Posted on February 25, 2023   2 minute read ∼ Filed in  : 

Summary

The functionality paper

  • Motivations => others cannot achieve it => we aim to achieve it => challenges.

Questions:

The paper doesn’t explain much about why it has good scalability than a spark.

Introduction

Background & Motivation

Support ML-based applications from DBMS has some advantages:

  • eliminating big volumes of data transfer.
  • avoiding maintenance overhead of separate analytical system
  • addressing concerns of data security and provenance.

Gap

MADlib => only supports PostgreSQL and Greenplum.

Oracle’s ORE and Microsoft’s SQL MLS => do not match the scalability and performance of natively in-database ML.

Spark-MLLIB, H2O => only compute engine, not full-fledged DBMS.

Goal

  • run high performant ML algorithms inside the database.
  • with many integration options.
  • model storage and management system.

Challenge

  • To make the DBMS supports iterative queries for ML training.

    => solutions: use an internal distributed cache.

  • Archiving and managing ML models.

    => solutions: not limit ML models to tabular data structures.

Details

Two building Funs

UDF and MetaFunctions

UDF can be called a part of a SQL statement, and it receives one pass of its input data.

Metafunction is a thin procedure that mainly controls the logic and flow of a complicated task.

Integrated ML functions and model management APIs

Architecture

Model => catalog object for storing metadata, trained parameters stored in distributed File System

Parallelly execute UDTFs, and each one uses only the party of data. One leader thread reads the model to buffer and share the pointer with other threads.

Reading data is composed of reading from files, decompressed, decoded, and constructed in blocks. This process is costly. The paper uses distributed in-memory storage blob to store the user data and clear the memory after model training is finished.

Evaluation

measure scalability compared with spark.





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.