Spitz A Verifiable Database System

Posted on February 22, 2022   9 minute read ∼ Filed in  : 

Introduction

Problems

Trend of decentralization in database introduces a new requirement - database’s integrity must be able to verified. eg, third-party service providers can be malicious. so their database must maintain a trusted data history and allow users to verify the integrity of both current and historical data. (tampering of data and query execution can be securely detected.)

The main requirement incluede data immutability, tamper evidence.

  • Immutability: data is only writen once and never deleted. (has more effecient concurrency due to no synchronization of written is needed), it is also used in RDD (spark)
  • Query verifiability(tamper evidence): query result contains integrity proofs for both data and query execution.(use can use it to detect if data or query has been tampered)

Challenges in realizing the above requirements

  • Store management: data immutability requires managing ever-increasing volume of data. its management needs to be efficient and reliable.
  • Efficient access methods for querying immutable data: since data is huge, there must be efficient storage layout and indexs to support query.
  • Minimize performance overhead: VDB must generate integrity proofs whose cost can be significant small.
  • Support both OLTP/OLAP workloads:
    • OLAP: use fully-homomorphic encryption.
    • OLTP: require the strict ordering provided by serializability.
  • Deployability: easy data conversion of moving data to new database. easy programming model.

Contributions

  1. Discuess two approaches for realizing an efficient verifiable database, by extending existing systems, and developing a new system - spitz.
  2. Experimental study.

Verifiable Databases

Verifiable Database

using verifiable computation techniques

  • SNARKs support arbitrary computation tasks but requires expensive setup phase.
  • Ben-Sasson bound complexity of setup phase and query complexity.
  • vSQL use interactive protocol to support verifiable SQL queries.

using authentication data structures (Merkle trees)

  • some work evaluate authentication index structures combining Merkle trees and B+-trees.
  • ServeDB propose Merkle Tree index based on hierarchical cube encoding.
  • SUNDR protects all file system contents and proposes a fork consistency protocol to detect data tampering.
  • VeritasDB/Concerto use trusted hardware to speed up verification. Both store merkle tree inside SGX enclaves. VeritasDB store tree root inside enclave and Concerto use memory verification to avoid contention inside the enclaves.

Out-of Blockchain Database

blockchain and distributed database hava four dimension in common: replication, concurrency, storage and sharding.

  • TrustDBle propose an OLTP engine that provides verifiable ACID compliant transactions on shared data using trusted hardware.

  • BlockchainDB , Veritas , FalconDB , and LineageChain use blockchain as a verifiable storage and add database features on top of it.

    • BlockchainDB consists of database layer and storage layer, where database layer control consistency level of requests and storage layer is unified interface to underlying blockchains.

      It adopts a KV data model and supports get/put/verify operations. It translates requests from the database layer into blockchain transactions and monitors the transaction status.

      when a client invokes verify, a blockchain node contact other peers to check where the transaction is committed in the ledger. (a node in BlockchainDB dont have complete state. the state is partitioned to mutiple blockchain nodes.)

    • Veritas targets on complex data models (relation model). And it use Intel SGX as verifier that read database logs for transaction validation. Validation results (in form of verifiers’ vote) also stored in blockchain. It doesn’t support partitioning.

    • FalconDB organizes database records into authenticated data structure - Merkle Tree.

    • LineageChain is built on top of ForkBase and FabricSharp.

      It captures provenance during contract execution and stores it in a merkle tree, and provides skip list index to support efficient provenance queries.

Ledger Databse

QLDB is a cloud service providing data immutability and verifiability. It consists of blocks organized in a hash chain called journal.

Insert, update and delete operation are collected into blocks and appended to journal. And Merkle tree is built on the entire journal.

Oracle Blockchain Table offer append-only verifiable tables by implementing a centralized ledger model.

MongoDB supports verifiable change history by storing document collections in a hash chain.

Datomic is a distributed immutable database system designed to be ACID compliant.

Challenges and opportunites

Storage and Indexing

Merkle Patricia Trie (MPT) Merkle Bucket Tree (MBT) and Pattern-Oriented-Split Tree (POS-Tree) support efficient queries on immutable data.

Query Verification

Query verifiability in VDB means that the user who sends the query can verify the integrity of the result.

Client-side / service-side verfication

  • Verfication at client side is expensive,
  • SGX can help mitigate this cost to server. eg,. SGX outputs only succinct proofs that can be easily verified by user.

Online-verfication / deferred verfication

  • online verfication: Data committed after verfication succeeds. If recovery from malicious tampering is costly, then we can use online verification to prevent malicious commit.
  • deferred verfication: verfication is done over a batch of transactions.

Verfication via encryption:

  • encrypt data using private key, and store ciphertexts on untrusted storage.
  • but the computation on cloud must be conducted on ciphertexts.

Verfication via authentication data structure

  • Use merkle tree to store data. tree root is digest
  • The new digest is recalculated recursively and equality is checked with the previously saved digest

Concurrency control

The database has fixed transaction isolation levels, but sometimes different application requires different isolation level.

One solution to fix the isolation level at database to a weak level (read committed) and implement customized logic (locks, checking, reversions.) to handle stricter level in the applications.

Extending OLTP/OLAP to VDB

VDB can be implemented by adding a verifiable ledger to an existing database system. And the ledger supports immutable data and verifiable quries.

Non-intrusive design

image-20220222183210801

Ledger is added without modifying the original database system.

  1. the design minimizes disruption to existing systems
  2. the design incurs considerable performance overhead due to interaction with ledger due to interaction with the ledger.

Intrusive desige

image-20220222184003016

Embed the ledger into existing database system.

  1. This design eliminates communications overhead with outside ledger
  2. incures significant cost in data migration. data must be move to this system so they can use verfication.

System(Sptiz) Architecture.

image-20220222191344314

The system supports both OLTP and OLAP workloads with verifiable ledgers.

Control layer

Multiple processor nodes which accept and process requests from a global messsage queue.

Each node has three main components

  • Request handler: Interact with user, accept request and return results with proofs.
  • Auditor: Communicates with ledger in the storage layer to keep track of data change.
  • Transaction Manager: Control execution of the queries.

Storage layer

There are multiple index structures built into the storage layer to support verifiable query processing.

Ledger

Consist of sequence of hashed blocks, each block track modification of records, query statements, metadata and root node of indexs on entire dataset.

On top of the block linkedList is Merkle tree, which can be used to verify data.

Index

B+-tree for query processing. The input: requested keys, output: matched data cell

Query processing

The system supports both read/write/mixed workloads, and can parser SQL and self-defined JSON schema.

Write workload

  1. Resquest handler collects a transaction from queue.
  2. Auditor checks write operations and updates the ledger. ledger records the changes and return proof to auditor.
  3. TM traverses the B+ tree and perform write to cell store.
  4. TM return client with <Result, Proof>

Read worklaod

  1. Resquest handler collects a transaction from queue.
  2. TM traverses the B+ tree and perform read.
  3. TM ask auditor to get proofs from ledger.
  4. TM return client with <Result, Proof>

Concurrency Control(serializability)

Atomicity

Data is stored at bith indexs and the virtual storage. 2PC is used to make the data consistent.

In the prepare phase of 2PC, each transaction with read/write and write/write conflict with this order will abort.

Isolation

Since the cell is multi-versioned itself, concurrency control mechanisms based on MVCC are more suitable. eg,. MVCC with 2PL, MVCC with timestamp ordering, MVCC with OCC.

Global timestamp service to allocate the timestamps (version) upon a transaction starts and commits. And then order tx based on start time.

Limitation

  1. The global timestamp service has limitation.

  2. And the abort rate could be high in face of write-intensive workload.

Solution

  1. Adopt the hybrid logic timestamp scheme that allocates timestamps by each individual node and still has serializability guarantee
  2. Adopt the combination of OCC and MVCC by dynamically adjusting the transaction order to reduce abort rates

Proof and Verification

Clients can use the digest of the ledger to perform verification locally.

The system use deferred verfication to improve the throughput.

Experimental Study.

Implementation

  1. Implement a baseline system to emulate a commercial product.
  2. Furthermore, the appended blocks are materialized to indexed views for fast query processing.
  3. Implement ledger by adopting index from SIRI family
  4. Implement a immutable kv store using ForkBase without verfication.

Evaluation

OLTP workloads

read-only and write-only workloads in single thread setup

X: initial database size from 10,000 to 1,280,000 records,

Y: Throughput

With the initial database size increasing, read/write spend more time going through the index, as a result the throughput is decreasing.

image-20220222212751580

With unified index structure of Ledger the system can store proofs and values, and such storage faciliate read/write.

OLAP worklaods

range queries.

Performance of range qureies is worce than point query. This is due to the additional nodes needed to be traversed and scanned when the query is processed.

image-20220222212751580

AI for DB

  1. Learning which index to use
  2. Learning-based transaction management: AI to predict workloads and schedule the transactions.

DB for AI

Use VDB to make ensure the trustworthiness of analytical results.





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.