Query Processing on Tensor Computation Runtimes

Posted on February 22, 2023   3 minute read ∼ Filed in  : 

Summary

This paper claims that the DB should use GPU to power thier computation ability, and translate the SQL into tensor computations.

Questions:

  1. The experiments show that the database is less-efficient for the traditional TPC-H workloads in the CPU, so is there a switch between CPU and GPU for various workloads? How to decide when to use CPU or GPU?
  2. How to provide other properties in this database? Only optimize on reading but not insert?
  3. How to store the load the data? Since the overall progress will translate the data into a tensor, how to maintain such a tensor? Solve each time when the query executes.

Introduction

Background & Motivation

Tensor Computation Runtimes

The paper refers to the runtimes, compilers, and ML frameworks as tensor computation runtimes. And the form mainly uses some critical operators such as creating, indexing and slicing, etc.

Query processing on TCR

This is about supporting SQL queries using TCR operators.

Gap

The database can use multi-core and SIMD instructions but not more computational hardware such as GPU.

Challenge

Implementing a query processor on TCRs requires overcoming several challenges.

  • Expressivity: Many DB expressions, such as LIKE and IN, are too complex for the TCR.
  • Performance: Using a tensor to implement a relational operator could improve performance.
  • Data Representation: Translating the tables into tensor representation is complex, and TCRs cannot support strings or data types.
  • Extensibility: Run relation queries over TCR, make running a query on GPU, etc., possible.

Goal

The paper solves the above challenges and proposes a system that runs relational queries on TCRs b, such as Pytorch, TVM, and ONNX. It could:

  • Deliver significant performance improvements over CPU-based data systems
  • portability over various hardware and software
  • Parsimonious engineer efforts.

Details

Tensor query processor

image-20230222183052755

Using a unified infrastructure, the system mainly compiled relational operators and ML models into the tensor programsompilation**:

  • Input query => intermediate representation (IR), where each node is an operator, and each edge is data (tensor)
  • Optimization: eliminate any database-related in the IR graph, then rewrite the IR graph for better performance.
  • The planning layer transforms the IR graph into an operator plan with PyTorch tensor programs.
  • Execution Layer: Generate the program and compile it into various target formats .

Execution:

  • data => tensor format
  • data movements to/from device memory
  • Scheduling of the operators in the device.
  • Execute the operator sequentially.

Implementation

  • Provides tensor-based operators for many SQL operators, such as select and sort group-by.
  • Implement the sorted-join and hase-based join algorithms.

Evaluation

Datasets: TPC-H

Measurement metrics:

  • Compare latency with:
    • CPU database on a single core
    • GPU database
  • Scalability: compare the time used when
    • increasing cores
    • I am increasing the dataset size.
  • Cost/performance trade-off on GPU
  • Which operator is most costly?

Baseline:

  • CPU: spark, DuckDB
  • GPU: BlazingSQL and OmnisciDB




END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.