Learned Cardinality Estimation for Similarity Queries

Posted on December 11, 2023 1 minute read ∼ Filed in : A paper note

Introduction

Objective: DNN for estimating the cardinality of similarity queries (similarity search).

similarity search: provide an estimation card for the number of objects in D whose distance to a query q are not greater than a distance threshold pie.
similarity join: takes Q as inputs, and provides an estimation card for a total number of pairs (q, p), whose distance between q of Q and p of D is not greater than pie.

Problems/Insights:

big modules is hard to well capture the distribution of distances between data and arbitrary query.
how to design small modules is challenging.

Solutions: This paper improves the accuracy and reduces the size of training data:

query segmentation: divides a query into multiple segments, and trains a module E1 to produce an embedding zq of xq.
data segmentation: divides the data D into n segments, and trains a model for each segment, each model has three DNNs, each DNN for embedding of query, distance, and a segment of D.

Tags Cloud

Categories Cloud

It's the niceties that make the difference fate gives us the hand, and we play the cards.