cockroachDB-opt
4 minute read ∼ Filed in : practise1. Schema design (avoid hotpot)
Relations
Database -> schema -> (tables, views, sequences)
Schema best practices
If need multiple tables with the same name, do so in the different schema in the same DB
use schema_name.table_name to access the table
Make transactions smaller by operating on fewer data per transaction. This will offer fewer opportunities for transactions’ data access to overlap.
Split the table across multiple ranges to distribute its data across multiple nodes for better load balancing of some write-heavy workloads.
SHOW RANGES FROM TABLE users
Table best practices
Create tables:
CREATE TABLE database_name.schema_name.table_name
Create Columns:
Values exceeding 1MB can lead to write amplification(写放大) and cause performance degradation
Set index (one or more columns )properly:
-
Must be unique and not-null values
-
Set to all col used for sorting or filter (=,in )
-
Use storing to store hot data cols in index, and also store join key.
-
l Avoid defining pk over a single column of sequential data, eg. auto-incrementing INT or timestamp value
-
l If must define in sequential keys, use hash-sharded index.
Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range hotspots and improving write performance on sequentially-keyed indexes at a small cost to read performance.
-
When possible, define primary key constraints over multiple columns (i.e., use composite primary keys).
The first col is well-distributed across nodes, and second col can be monotonically increasing pk
-
l For single-column pk, use UUID, gen_random_uuid() function, which ensure the pk value will be unique and well-distributed across a cluster
这个uuid无法用于查询,但是可以用于多个表的join
Add proper column constraints
Secondary index:
Unique constrain will set se-index automatically
Advanced schema design
Computed cols
Useful when table is frequently sorted
Column families
reduce the number of keys stored in the key-value store, resulting in improved performance during INSERT, UPDATE, and DELETE operations.
Assign frequently updated col to a single family
https://www.cockroachlabs.com/docs/v21.1/column-families
Partial index
Add an index to part of rows and cols
Spatial index
Multiple regions
2. Write data/read data
Insert
Batch multiple rows in one multi-row insert statement, use batch (10,100, 1000)
Do not include multi-row insert statements with an explicit tx
Update
Must use where.
Use a batch-update loop to update many rows (https://www.cockroachlabs.com/docs/v21.1/bulk-update-data)
Wrap the update with a retry (https://www.cockroachlabs.com/docs/v21.1/error-handling-and-troubleshooting#transaction-retry-errors)
Follower read
https://www.cockroachlabs.com/docs/v21.1/follower-reads.html
Follower reads are a mechanism that CockroachDB uses to provide faster reads in situations where you can afford to read data that may be slightly less than current (using AS OF SYSTEM TIME
)
Requirements
- Read latency is low, but write latency is higher
- Read can be historical (4.8 seconds or more in past)
- Available during region failure
3. Manage Tx
Retry:
System automatic retry
Require the insert/update/delete without ”returning”, and have a small returned result size.
If returned result is greater than 16kb, cannot retry
Client side retry:
Nested tx
Can rollback to any position in sub tx
4. Optimize performance
https://www.cockroachlabs.com/docs/v21.1/make-queries-fast.html
https://www.cockroachlabs.com/docs/v21.1/performance-best-practices-overview#understanding-and-avoiding-transaction-contention
Avoid transaction contention
Contention
- Tx operates on same range, but different index key value, limited by single node hardware
- Tx operates on same index key values, will be more strictly serialized to obey Tx isolation semantics
Multiple strategies
- Use proper index to distribute the range to multiple nodes
- Make tx small
- Avoid multiple client-server exchanges per transaction, group select, insert etc together in single SQL
- Select for update for scenario of read and then update
- User upsert when replacing values in row
- Do normalization to the table
- If the application strictly requires operating on very few different index key values, consider using ALTER … SPLIT AT so that each index key value can be served by a separate group of nodes in the cluster.