cockroachDB-opt

Posted on August 12, 2021   4 minute read ∼ Filed in  : 

1. Schema design (avoid hotpot)

Relations

Database -> schema -> (tables, views, sequences)

Schema best practices

If need multiple tables with the same name, do so in the different schema in the same DB

use schema_name.table_name to access the table

Make transactions smaller by operating on fewer data per transaction. This will offer fewer opportunities for transactions’ data access to overlap.

Split the table across multiple ranges to distribute its data across multiple nodes for better load balancing of some write-heavy workloads.

SHOW RANGES FROM TABLE users

image-20211028154031616

Table best practices

Create tables:

CREATE TABLE database_name.schema_name.table_name

Create Columns:

image-20211028154306793

Values exceeding 1MB can lead to write amplification(写放大) and cause performance degradation

Set index (one or more columns )properly:

  1. Must be unique and not-null values

  2. Set to all col used for sorting or filter (=,in )

  3. Use storing to store hot data cols in index, and also store join key.

    image-20211028154437616

  4. l Avoid defining pk over a single column of sequential data, eg. auto-incrementing INT or timestamp value

  5. l If must define in sequential keys, use hash-sharded index.

    Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range hotspots and improving write performance on sequentially-keyed indexes at a small cost to read performance.

    image-20211028154505841

    image-20211028154513220

  6. When possible, define primary key constraints over multiple columns (i.e., use composite primary keys).

    The first col is well-distributed across nodes, and second col can be monotonically increasing pk

  7. l For single-column pk, use UUID, gen_random_uuid() function, which ensure the pk value will be unique and well-distributed across a cluster

    这个uuid无法用于查询,但是可以用于多个表的join

    image-20211028154559441

Add proper column constraints

image-20211028154636919

Secondary index:

​ Unique constrain will set se-index automatically

Advanced schema design

Computed cols

Useful when table is frequently sorted

image-20211028154732484

image-20211028154736635

Column families

reduce the number of keys stored in the key-value store, resulting in improved performance during INSERT, UPDATE, and DELETE operations.

Assign frequently updated col to a single family

https://www.cockroachlabs.com/docs/v21.1/column-families

Partial index

Add an index to part of rows and cols

Spatial index

Multiple regions

2. Write data/read data

Insert

Batch multiple rows in one multi-row insert statement, use batch (10,100, 1000)

Do not include multi-row insert statements with an explicit tx

Update

Must use where.

Use a batch-update loop to update many rows (https://www.cockroachlabs.com/docs/v21.1/bulk-update-data)

Wrap the update with a retry (https://www.cockroachlabs.com/docs/v21.1/error-handling-and-troubleshooting#transaction-retry-errors)

Follower read

https://www.cockroachlabs.com/docs/v21.1/follower-reads.html

Follower reads are a mechanism that CockroachDB uses to provide faster reads in situations where you can afford to read data that may be slightly less than current (using AS OF SYSTEM TIME)

Requirements

  1. Read latency is low, but write latency is higher
  2. Read can be historical (4.8 seconds or more in past)
  3. Available during region failure

3. Manage Tx

image-20211028155026069

image-20211028155032359

Retry:

System automatic retry

​ Require the insert/update/delete without ”returning”, and have a small returned result size.

​ If returned result is greater than 16kb, cannot retry

Client side retry:

image-20211028155113030

Nested tx

Can rollback to any position in sub tx

4. Optimize performance

https://www.cockroachlabs.com/docs/v21.1/make-queries-fast.html

https://www.cockroachlabs.com/docs/v21.1/performance-best-practices-overview#understanding-and-avoiding-transaction-contention

Avoid transaction contention

Contention

  1. Tx operates on same range, but different index key value, limited by single node hardware
  2. Tx operates on same index key values, will be more strictly serialized to obey Tx isolation semantics

Multiple strategies

  1. Use proper index to distribute the range to multiple nodes
  2. Make tx small
  3. Avoid multiple client-server exchanges per transaction, group select, insert etc together in single SQL
  4. Select for update for scenario of read and then update
  5. User upsert when replacing values in row
  6. Do normalization to the table
  7. If the application strictly requires operating on very few different index key values, consider using ALTER … SPLIT AT so that each index key value can be served by a separate group of nodes in the cluster.




END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.