cockroachDB-opt

Posted on August 12, 2021 4 minute read ∼ Filed in : practise

1. Schema design (avoid hotpot)

Relations

Database -> schema -> (tables, views, sequences)

Schema best practices

If need multiple tables with the same name, do so in the different schema in the same DB

use schema_name.table_name to access the table

Make transactions smaller by operating on fewer data per transaction. This will offer fewer opportunities for transactions’ data access to overlap.

Split the table across multiple ranges to distribute its data across multiple nodes for better load balancing of some write-heavy workloads.

SHOW RANGES FROM TABLE users

Table best practices

Create tables:

CREATE TABLE database_name.schema_name.table_name

Create Columns:

Values exceeding 1MB can lead to write amplification（写放大） and cause performance degradation

Set index (one or more columns )properly:

Must be unique and not-null values
Set to all col used for sorting or filter (=,in )
Use storing to store hot data cols in index, and also store join key.
l Avoid defining pk over a single column of sequential data, eg. auto-incrementing INT or timestamp value
l If must define in sequential keys, use hash-sharded index.

Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range hotspots and improving write performance on sequentially-keyed indexes at a small cost to read performance.
When possible, define primary key constraints over multiple columns (i.e., use composite primary keys).

The first col is well-distributed across nodes, and second col can be monotonically increasing pk
l For single-column pk, use UUID, gen_random_uuid() function, which ensure the pk value will be unique and well-distributed across a cluster

这个uuid无法用于查询，但是可以用于多个表的join

Add proper column constraints

Secondary index:

Unique constrain will set se-index automatically

Advanced schema design

Computed cols

Useful when table is frequently sorted

Column families

reduce the number of keys stored in the key-value store, resulting in improved performance during INSERT, UPDATE, and DELETE operations.

Assign frequently updated col to a single family

https://www.cockroachlabs.com/docs/v21.1/column-families

Partial index

Add an index to part of rows and cols

Spatial index

Multiple regions

2. Write data/read data

Insert

Batch multiple rows in one multi-row insert statement, use batch (10,100, 1000)

Do not include multi-row insert statements with an explicit tx

Update

Must use where.

Use a batch-update loop to update many rows (https://www.cockroachlabs.com/docs/v21.1/bulk-update-data)

Wrap the update with a retry (https://www.cockroachlabs.com/docs/v21.1/error-handling-and-troubleshooting#transaction-retry-errors)

Follower read

https://www.cockroachlabs.com/docs/v21.1/follower-reads.html

Follower reads are a mechanism that CockroachDB uses to provide faster reads in situations where you can afford to read data that may be slightly less than current (using AS OF SYSTEM TIME)

Requirements

Read latency is low, but write latency is higher
Read can be historical (4.8 seconds or more in past)
Available during region failure

3. Manage Tx

Retry:

System automatic retry

Require the insert/update/delete without ”returning”, and have a small returned result size.

If returned result is greater than 16kb, cannot retry

Client side retry:

Nested tx

Can rollback to any position in sub tx

4. Optimize performance

https://www.cockroachlabs.com/docs/v21.1/make-queries-fast.html

https://www.cockroachlabs.com/docs/v21.1/performance-best-practices-overview#understanding-and-avoiding-transaction-contention

Avoid transaction contention

Contention

Tx operates on same range, but different index key value, limited by single node hardware
Tx operates on same index key values, will be more strictly serialized to obey Tx isolation semantics

Multiple strategies

Use proper index to distribute the range to multiple nodes
Make tx small
Avoid multiple client-server exchanges per transaction, group select, insert etc together in single SQL
Select for update for scenario of read and then update
User upsert when replacing values in row
Do normalization to the table
If the application strictly requires operating on very few different index key values, consider using ALTER … SPLIT AT so that each index key value can be served by a separate group of nodes in the cluster.

cockroachDB-opt

1. Schema design (avoid hotpot)

Relations

Schema best practices