Distribution Shifts

Posted on June 3, 2024 1 minute read ∼ Filed in : A paper note

Covariate Shift: P(X) changes, but P(Y∣X) stays the same.

Label Shift: P(Y) changes, but P(X∣Y) stays the same.

Concept Shift: P(Y∣X) changes.

I’m currently researching, where i need to use machine learning to enhance query optimizer.

My current plan is three steps

I learned from the paper “Bao: Learning to Steer Query Optimizers”, which uses a set of hints to generate a set of query plans, which I follow. From a Refincemenrt learning perspective, each query plan here is one action,
I then try to use a new model to predict which one is better. so I need to first feature the input and the model can predict which plan is better.
if we use RL to model this, each hint in the hint set is one action, the state can be the input SQL, cpu/memory etc,. the reward can be the 1/latency.

While in this situation, we simply can train the decision transformer, however, i basically do not use the query plan structure, which is generated by the database optimizer via the hint, is it?

As for the query plan,

i firstly use a paper’s method (queryFormer) to convert a query plan tree into a fixed vector encoding of dimension (batch_size=1024, dimension=329).

Then I wanna use the decision transformer to predict which one is the best action.

while predicting the best hint is bascially find the query palan which has the mini exeucution latency,

but not using anything from the query plan is a bad idea.
how to use the query plan into my current setting? or ydo you have gbetter idea to add those information?