Towards a Unified Architecture for in-RDBMS Analytics
1 minute read ∼ Filed in : A paper noteSummary
Introduction
Integrating analytics into DBMS may introduce some deployment overhead due to lacking unified architecture in-database analytics, which enables generically performance optimizations. For example, two factors influence performance:
- data ordering
- the parallelization of computations in single-node multicore RDMBs.
This paper then tries to investigate those two factors theoretically and empirically and then proposes a feasible novel unified architecture for general in-database analysis.
- Integrating many data analytics tasks which can be formulated as Incremental Gradient Descent.
- Develop a novel strategy to improve the performance, which is limited by the bad ordering in RDMS.
- Adapt existing approaches to make them run in parallel.
Architecture
Data order
The paper shows that some data orderings allow the IGD algorithm to converge in fewer epochs than others.
Parallelism
IGD can be run in parallel in nature.
Shard memory outperforms the shared-nothing architecture in distributed training in accuracy, while shared memory can also be lockless because of the nature of the IGD algorithms.