Learned Cardinality Estimation for Similarity Queries
1 minute read ∼ Filed in : A paper noteIntroduction
Objective: DNN for estimating the cardinality of similarity queries (similarity search).
- similarity search: provide an estimation card for the number of objects in D whose distance to a query q are not greater than a distance threshold pie.
- similarity join: takes Q as inputs, and provides an estimation card for a total number of pairs (q, p), whose distance between q of Q and p of D is not greater than pie.
Problems/Insights:
- big modules is hard to well capture the distribution of distances between data and arbitrary query.
- how to design small modules is challenging.
Solutions: This paper improves the accuracy and reduces the size of training data:
- query segmentation: divides a query into multiple segments, and trains a module E1 to produce an embedding zq of xq.
- data segmentation: divides the data D into n segments, and trains a model for each segment, each model has three DNNs, each DNN for embedding of query, distance, and a segment of D.