Hybrid Transactional Analytical Processing A Survey
3 minute read ∼ Filed in : A paper noteIntroduction
Goal
This tutorial is
- Review the historical progression of OLTP and OLAP systems
- Discuss the driving factors for HTAP
- Provide a technical analysis of existing HTAP solutions.
After that it give some reserach challenges / topics.
Research Topics & Challenges
-
Most HTAP systems provide OLTP and OLAP seperately, but not support efficient processing of transactional and analytical requests within the same transactions.
To fully support HTAP, systems should allow analytics on recent data not only after the transaction that is ingesting or updating that data has committed, but also as part of the same request.
-
Most HTAP solutions today use multiple components to provide all the desired capabilities. These different components are usually maintained by different groups of people. Therefore, keeping these components compatible and providing the end-users with the illusion of a single system is a challenging task.
-
Distributed OLAP systems uses shared file systems that mainly optimized for scans but not point access such as accessing individual records or columns. How to support fast point access to shared file system and object stores is challenging.
History
Traditional DBs:
-
OLAP systems: BLU, Vertica, ParAccel, GreePlumDB, Vectorvise
-
In-memory OLTP systems: VOltDB, Hekaton, MemSQL.
New DB architectures are generated due to new hardwares, multi-core, various levels of memory caches, and large memories.
Big-data analysis tools
-
Voldemort, Cassandra, RocksDB offers fast insert and lookups, but lacks query capabilities.
They mainly based on columnar storage formats like Parquet.
-
Hive, BIgSQL, Impala, SparkSQL support the OLAP queries. But doens’t support OLTP query.
HTAP SOLUTIONS
Single Engine for OLAP and OLTP
The mainly adding support of another to one system, but still on a single engine.
And they mainly differes based on the data organization they use for their transactional and analytical requests.
Separate Data Organization
The data is stored at different format for OLTP and OLAP queries.
SAP HANA, Oracles TImesTen:
- In-memory columnar processing for OLAP query + row store for OLTP
MemSQL:
- In-memory OLTP + Convert to columnar format when data is written to disk.
Pelaton:
- In-memory HTAP system. It provides adaptive data organizations which changes the data format at runtime based on the requrest type.
Same Data Organization
The data is stored in a single format, and the system support both queries.
Hive:
- Support Txs ( insert, update, delete )
Separate OLTP and OLAP Engines
This is the system with different engines, but differs on sharing/ non-sharing storage.
Decoupling the Storage for OLTP and OLAP
The data is stored in key-value store, and are groomed into Parquet or ORC files on HDFS for queries.
Using the Same Storage for OLTP and OLAP
Some system combine their product with Spark ecosystems.
Hbase and Cassandra store data in key-value stores, While use Spark-SQL connector to allow Spark SQL to directly query HBase and Cassandra data, respectively.