YESQL

Posted on November 9, 2023   1 minute read ∼ Filed in  : 

YESQL

motivation

  1. in-db analysis
  2. OpenAIRE
  3. text mining of full text: link project to people.
    1. many language to help such pipeline design, , but they complicated system, unscalable processing.
      1. PandaAPI,
    2. in-db is efficient, but limited expressive power,
    3. SQL + UDF =?
    4. Example Executions Tools
      1. Pands, DuckDB, Psql, VerticaDBs. And then compare their file storage, loading time(read all), query time.
      2. UDF + SQL Efficiency? it is slow, thus need to improve

Challenges

mismatch of SQL and Python operation.

  1. context switch, data copies, function calls, inefficient compilation, limited query optimzies, long uDF piopelines.
  2. Architecture
    1. UDF In DB

UDF Translation

  1. Py compilers: JIT compilation
  2. Py transpilers: Py -> C++
    1. UDF -> SQL
    2. UDF -> Itermideiate Representations IR
    3. UDF -> Engine.
      1. Solutions
        1. UDF optimi: parallel, vectorize, Function inline, In/out process, tracking JIT
        2. ?

YESQL

characteristics

  1. usable, expressive, performant Python UDFs.
  2. Expressiveness: stateful,
    1. dynamically typed:
    2. scalar, aggregate, table udfs
  3. Performance: JIT compilation, parallelization, statefulness, Fusion.
  4. Usability: parametric polymorphic UDFS, Functional Syntax for UDFS

Architecture

Both server-based and Embedded DBMS.

Compile

  1. UDF -> code produced by YeSQL -> static time or runtime.
  2. python -> python + C

Example with Experiment

Usability: ask people to test, they can do it. -> good usability.

Performance:

Fusion

  1. CFFI conversion is eliminated? -> Merge two Python at C functions level.
  2. relation operator + UDF operator?
  3. Fusiability or not -> pipeline in/output, then it’s Fusiability.

Evaluation

Future work

SQL mismatch is still challenging

  1. push computation into db for scalability
  2. high expressiveness, usability, performance,
  3. Fusion in UDFs and relation operators
  4. Ongoing work
    1. Deeper fusion-based optimization
    2. Provably-correct Python2yesql translation
    3. Federated YeSQl query processing.




END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.