Skip to content
Go back

80/20 Of The Week: Partitioning and Sharding

Edit page

80/20 Of The Week 🗣️ … Partitioning and Sharding

There’s a whole science behind these two! It’s a design decision that can make your system fly… or haunt you later.

There are a couple of key concepts for this so let’s go through them quickly

1) Horizontal vs Vertical data splitting

Essentially:

2) Partitioning vs Sharding

These describe where that split happens.

Partitioning improves manageability and query performance. Sharding gives you real horizontal scale, but also adds operational pain such as rebalancing, consistency tradeoffs, harder migrations and so on and so forth (more on this in future).

3) Shard key

Your shard key is what decides where each row goes to! Good shard key:

Bad shard key = “one shard is dying while others are chilling” Classic hotspot: sharding by country, date, or anything super skewed.

4) Sharding strategies

A) Range-based

(like userId 1–1M on one, 1M–2M on other…) Easy to understand, but can create hotspots because certain ranges could be queried more than others.

B) Hash-based

Distributes well, but is less intuitive to inspect/debug.

C) Modulo hashing

hash(key) % N Simple, but painful when the number of shards changes… N here is the number of shards that you have.

D) Consistent hashing (ring)

Adding/removing nodes moves only a small portion of keys. Better for elastic scaling (usually with virtual nodes to balance load, we gon talk about this later on since it’s a topic of its own).

Although these concepts sound simple, sharding is rarely hard because of hashing itself. It becomes hard because of rebalancing, query patterns, bad shard keys, and the operational mistakes that only start hurting once you scale in production!

Sharding and Partitioning


Edit page
Share this post on:

Next Post
80/20 Of The Week: Idempotency and Resilience