ColumnML Column-Store Machine Learning with On-The-Fly Data Transformation
1 minute read ∼ Filed in : A paper noteTakeaways
-
Some words
linear throughput scalability. on-the-fly data transformation
-
How can we integrate ML into a column-store DBMS without disrupting either DBMS efficiency or ML quality and performance?
System
It integrates ML into a column-store DBMS without disrupting either DBMS efficiency or ML quality and performance.
- The paper propose pSCD to achieve the cache-efficient training on column store
- For a batch of data, read each column and then perform training requires Batch*Feature cache size.
- Partitioned SCD to achieve cache-efficient training: Coordinate-descent based algorithms enable a way of accessing the samples one feature at a time, which natively corresponds to column-wise access.
- The paper uses FPGA to do the data-preprocessing and ML computations.