IntegriDB Verifiable SQL for Outsourced Databases

Posted on February 25, 2022   7 minute read ∼ Filed in  : 

IntegriDB: Verifiable SQL for Outsourced Databases

INTRODUCTION

Problems

An authenticated data structure (ADS) could be used to verify queried data.

Prior work used to construct ADS handling some subset of SQL queries can be classified into two categories:

  1. Generic approaches:
    • Data owners first compile database into a program (circuit / RAM-model of computation.) which accepts query as input and return result.
    • Size of circuit is as large as database.
    • Circuit-based approach not allow efficient updates.
  2. Specific approaches:
    • Tree-based (Merkel tree) / Signature-based (data in server is pre-signed by data owner)
    • Both of them only support single-dimensional range queries.
    • Both of them don’t support updates.

Generic approach taks large memory space, Specific approach is limited in dimension. Both of them don't support short proofs for (join, sum, max, min, count)

Contribution

The paper designing an ADS and an associated system that supports native SQL queries over a relational database, Such a system would be suitable for integration into the most prevalent applications running in the cloud today and could be offered as a software layer on top of any SQL implementation.

  1. expressive: supports multidimensional range queries and nesting queries.
  2. efficient: small proofs(few kb), low verification time, feasible server computation.
  3. scalable: can execute on DB table with 6 million rows.
  4. cryptographic assumptions: based on crypto-graphic assumptions

PRELIMINARIES

Authenticated Data Structure

Three roles, server, data owner, client

ADS for query Q and U consists of efficient algorithms init, setup, prove, verify, UpdateO, UpdateS

D: data, D’: encrypted data, R: Result, pi: Proof, Q: Query, lambda: secure parameter

  1. (data owner) Init: lambda => <pk, sk>
  2. (data owner) Setup: ( D, sk ) => ( digest, D’ ), and send D’ to server, publish (digest, pk)
  3. (Server) Prove: ( digest, D’, Q) => (R, pi)
  4. (client) Verify: ( digest, Q, R, pi) => 0 or 1
  5. UpdateO/UpdateS: Interactive algorithm run by data owner and server.
    • UpdateO: ( sk, digest, update ) => ( digest, 0/1 indicting accept or reject )
    • UpdateS: D’ => D’

SQL queries supported by IntegriDB

Multidimensional range query

multiple columns and range filter for each column

SELECT * FROM A WHERE (age BETWEEN 22 AND 24) AND (student_ID > 10730)

Join, Sum, Max, Min, COunt, Like, Nested Query.

BUILDING BLOCKS

two authenticated data structures used as building blocks in INTEGRIDB:

one for set operations (that we call ASO),

one for interval trees (that we call AIT )

ADS for set operations with Summation

set operations

includes: Union, Intersection, Sum.

Authenticated data structure

Mainly use bilnear accumulator primitive as ADS.

Data owner firstly init (s, g:public key), and then use bilnear accumulator primitive to calculate the digest for each set.

image-20220226171006104

Data Owner can publish g and acc(s).

Verify set operation

  1. Client get above digest and g
  2. server compute:

image-20220226171610719

  1. the server then releases a1 and a0, compute w1, w2 with g and then realse them also.

    image-20220226172610947

  2. client check follows:(SBDH assumption)

    image-20220226172038309

Complexity

Proofs are constant size. eg,. query involving d set operations, proof size is O(d)

ADS for interval Trees

Interval tree

Random function f, binary tree T

  1. Leaf node stores element of dataset S = {(k1,v1)…(kn,vn)}
  2. Internal node u stores kv pair,
    1. k = maximum key stored at any node in the left subtree of u
    2. v = f(leftNode, rightNode)

Support queries: Search, RangeCover, Insert, Delete

Authenticated data structure

Mainly use Merkle tree, where each leaf node is hashed using kv pair. And the root of Merkle tree is a digest of the tree.

Complexity

The size of the minimal covering set output by RangeCover is O(log n)

RnageCover is O(logn), size of proof and complexity of verification for Search, RangeCover, Insert, Delete is O(n).

OUR CONSTRUCTION

ADS used in IntegriDB for different queries

Setup

For each table, each pair of columns in the table, data owner create an authenticated interval tree.

image-20220227182455432

Leaf Node: store kv of Sij

Internal node U

  • Key: minimum key stored at the leaves on the left.
  • Value:

image-20220227182629776

image-20220228131601138

image-20220227185110723

The accumulation value in each node will be used to handle JOIN and multidimensional range queries.

Complexity

mi = columns and ni = rows in table i.

size of the secret key is O(1)

Join Queries

Scenario

The Join return result: C* = Ci intersect with Cj.

select ... from  T join T' on T.Ci = T'.Cj 

Server

  1. For each value x in C*, the server returns the entire row.
  2. Return acc(ci) and acc(cj)

Client

  1. Hash each row received from server and check each row’s hash value == digest’s hash value.
  2. Check acc(ci) or acc(cj) == acc in digest

image-20220228112400906

proof size and verification time are O( R )

Multidimensional Range Queries

Scenario

2 dimensional range query

select ... from ... where w in [w, w+] and z in [z, z+];

Server

return value

image-20220228124541523

Client

Verify bu using acc(cn) , and check it acc == acc in digest.

Check row hashes to guarantee that the returned rows have not been altered.

Complexity

The proof size is O(d·log n)

Verification complexity O(d log n + R ).

SQL Functions

Summation

For any column j, client can verify using Acc(cj) in root, The proof size and verification time are both O(1).

Max and min

server compute max value and then convert it to range query.

select max(j) from T => select * from T where j >= j_max

Count

server convert count to sum operation.

Updates

the data owner and server jointly update the corresponding AIT and ASO using their respective interfaces.

There is a rotation (to maintain a balanced tree) after an update in the interval tree

image-20220228132847669

Nested Queries

ADDITIONAL DISCUSSION

Extensions and Optimizations

Improve setup time and reduce server storage space.

To do this, we construct a homomorphic Merkle tree over each row, and add a column c to each table that stores the root of the Merkle tree for the corresponding row.

Limitations

cannot support agregations, comparsion and join with deplicates in nested query such as

SELECT c1 +2c2 FROM . . . 

IMPLEMENTATION

image-20220228135341654

EVALUATION

Env

Awazon EC2, TPC Benchmark.

Evaluation on TPC Benchmark

image-20220228135610701

Performance on Synthetic Data

image-20220228135739435

image-20220228135835880

image-20220228135923836





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.