YESQL
1 minute read ∼ Filed in : A paper noteYESQL
motivation
- in-db analysis
 - OpenAIRE
 - text mining of full text: link project to people.
    
- many language to help such pipeline design, , but they complicated system, unscalable processing.
        
- PandaAPI,
 
 - in-db is efficient, but limited expressive power,
 - SQL + UDF =?
 - Example Executions Tools
        
- Pands, DuckDB, Psql, VerticaDBs. And then compare their file storage, loading time(read all), query time.
 - UDF + SQL Efficiency? it is slow, thus need to improve
 
 
 - many language to help such pipeline design, , but they complicated system, unscalable processing.
        
 
Challenges
mismatch of SQL and Python operation.
- context switch, data copies, function calls, inefficient compilation, limited query optimzies, long uDF piopelines.
 - Architecture
    
- UDF In DB
 
 
UDF Translation
- Py compilers: JIT compilation
 - Py transpilers: Py -> C++
    
- UDF -> SQL
 - UDF -> Itermideiate Representations IR
 - UDF -> Engine.
        
- Solutions
            
- UDF optimi: parallel, vectorize, Function inline, In/out process, tracking JIT
 - ?
 
 
 - Solutions
            
 
 
YESQL
characteristics
- usable, expressive, performant Python UDFs.
 - Expressiveness: stateful,
    
- dynamically typed:
 - scalar, aggregate, table udfs
 
 - Performance: JIT compilation, parallelization, statefulness, Fusion.
 - Usability: parametric polymorphic UDFS, Functional Syntax for UDFS
 
Architecture
Both server-based and Embedded DBMS.
Compile
- UDF -> code produced by YeSQL -> static time or runtime.
 - python -> python + C
 
Example with Experiment
Usability: ask people to test, they can do it. -> good usability.
Performance:
Fusion
- CFFI conversion is eliminated? -> Merge two Python at C functions level.
 - relation operator + UDF operator?
 - Fusiability or not -> pipeline in/output, then it’s Fusiability.
 
Evaluation
Future work
SQL mismatch is still challenging
- push computation into db for scalability
 - high expressiveness, usability, performance,
 - Fusion in UDFs and relation operators
 - Ongoing work
    
- Deeper fusion-based optimization
 - Provably-correct Python2yesql translation
 - Federated YeSQl query processing.