Query Processing on Tensor Computation Runtimes
3 minute read ∼ Filed in : A paper noteSummary
This paper claims that the DB should use GPU to power thier computation ability, and translate the SQL into tensor computations.
Questions:
- The experiments show that the database is less-efficient for the traditional TPC-H workloads in the CPU, so is there a switch between CPU and GPU for various workloads? How to decide when to use CPU or GPU?
- How to provide other properties in this database? Only optimize on reading but not insert?
- How to store the load the data? Since the overall progress will translate the data into a tensor, how to maintain such a tensor? Solve each time when the query executes.
Introduction
Background & Motivation
Tensor Computation Runtimes
The paper refers to the runtimes, compilers, and ML frameworks as tensor computation runtimes. And the form mainly uses some critical operators such as creating, indexing and slicing, etc.
Query processing on TCR
This is about supporting SQL queries using TCR operators.
Gap
The database can use multi-core and SIMD instructions but not more computational hardware such as GPU.
Challenge
Implementing a query processor on TCRs requires overcoming several challenges.
- Expressivity: Many DB expressions, such as LIKE and IN, are too complex for the TCR.
- Performance: Using a tensor to implement a relational operator could improve performance.
- Data Representation: Translating the tables into tensor representation is complex, and TCRs cannot support strings or data types.
- Extensibility: Run relation queries over TCR, make running a query on GPU, etc., possible.
Goal
The paper solves the above challenges and proposes a system that runs relational queries on TCRs b, such as Pytorch, TVM, and ONNX. It could:
- Deliver significant performance improvements over CPU-based data systems
- portability over various hardware and software
- Parsimonious engineer efforts.
Details
Tensor query processor
Using a unified infrastructure, the system mainly compiled relational operators and ML models into the tensor programsompilation**:
- Input query => intermediate representation (IR), where each node is an operator, and each edge is data (tensor)
- Optimization: eliminate any database-related in the IR graph, then rewrite the IR graph for better performance.
- The planning layer transforms the IR graph into an operator plan with PyTorch tensor programs.
- Execution Layer: Generate the program and compile it into various target formats .
Execution:
- data => tensor format
- data movements to/from device memory
- Scheduling of the operators in the device.
- Execute the operator sequentially.
Implementation
- Provides tensor-based operators for many SQL operators, such as select and sort group-by.
- Implement the sorted-join and hase-based join algorithms.
Evaluation
Datasets: TPC-H
Measurement metrics:
- Compare latency with:
- CPU database on a single core
- GPU database
- Scalability: compare the time used when
- increasing cores
- I am increasing the dataset size.
- Cost/performance trade-off on GPU
- Which operator is most costly?
Baseline:
- CPU: spark, DuckDB
- GPU: BlazingSQL and OmnisciDB