Distributed Deep Learning on Data Systems A Comparative Analysis of Approaches

Posted on July 26, 2023   1 minute read ∼ Filed in  : 

Introduction

Motivation

In-RDBMS ML (Greenplum as exmaple) is used ofen, and provide good usability. In-RDBMS ML requires scalable execution, and is compatible with DL toos.

And there are four metrics to evaluate performance of in-db-ms: 1.Runtime efficiency. 2. Easy-of-governace. 3. Implementation difficulty. 4. Portability.

Challenges

Integrating parallely model selection alg into RDBMS:

  1. Gettting partial results from the UDF is not trivial.
  2. UDFs communication requires pipes provided by RDMBS
  3. Only one query is permitted at anytime in one DB session, Mulitple session + split big query into small => enable parallelism. But it may not make full use of query optimization.
  4. Data is usally compressed and stored on disk as pagefiles. Frequently access involve decompression and accessing.

The paper introduce 4 design if integrating AI to DB: Fully in-RDBMS, Partially in-DBMS, Access data in DB with Direct Access, and out-of-RDBMS

Goal

The paper then integrating MOP based model selection into RDBMS since MOP requires no communications and workds on sharded data.





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.