YESQL
1 minute read ∼ Filed in : A paper noteYESQL
motivation
- in-db analysis
- OpenAIRE
- text mining of full text: link project to people.
- many language to help such pipeline design, , but they complicated system, unscalable processing.
- PandaAPI,
- in-db is efficient, but limited expressive power,
- SQL + UDF =?
- Example Executions Tools
- Pands, DuckDB, Psql, VerticaDBs. And then compare their file storage, loading time(read all), query time.
- UDF + SQL Efficiency? it is slow, thus need to improve
- many language to help such pipeline design, , but they complicated system, unscalable processing.
Challenges
mismatch of SQL and Python operation.
- context switch, data copies, function calls, inefficient compilation, limited query optimzies, long uDF piopelines.
- Architecture
- UDF In DB
UDF Translation
- Py compilers: JIT compilation
- Py transpilers: Py -> C++
- UDF -> SQL
- UDF -> Itermideiate Representations IR
- UDF -> Engine.
- Solutions
- UDF optimi: parallel, vectorize, Function inline, In/out process, tracking JIT
- ?
- Solutions
YESQL
characteristics
- usable, expressive, performant Python UDFs.
- Expressiveness: stateful,
- dynamically typed:
- scalar, aggregate, table udfs
- Performance: JIT compilation, parallelization, statefulness, Fusion.
- Usability: parametric polymorphic UDFS, Functional Syntax for UDFS
Architecture
Both server-based and Embedded DBMS.
Compile
- UDF -> code produced by YeSQL -> static time or runtime.
- python -> python + C
Example with Experiment
Usability: ask people to test, they can do it. -> good usability.
Performance:
Fusion
- CFFI conversion is eliminated? -> Merge two Python at C functions level.
- relation operator + UDF operator?
- Fusiability or not -> pipeline in/output, then it’s Fusiability.
Evaluation
Future work
SQL mismatch is still challenging
- push computation into db for scalability
- high expressiveness, usability, performance,
- Fusion in UDFs and relation operators
- Ongoing work
- Deeper fusion-based optimization
- Provably-correct Python2yesql translation
- Federated YeSQl query processing.