80/20 Of The Week: Partitioning and Sharding

80/20 Of The Week 🗣️ … Partitioning and Sharding

There’s a whole science behind these two! It’s a design decision that can make your system fly… or haunt you later.

There are a couple of key concepts for this so let’s go through them quickly

1) Horizontal vs Vertical data splitting

Horizontal partitioning = split by rows Example: users with IDs 0–999,999 in one partition, 1,000,000+ in another
Vertical partitioning = split by columns Example: keep id, name, email in one table, and move heavier or less frequently used columns like bio, avatar, settings elsewhere

Essentially:

Horizontal splitting helps with scaling data volume and traffic
Vertical splitting helps reduce row size and isolate hot vs rarely used fields

2) Partitioning vs Sharding

These describe where that split happens.

Partitioning = inside one database (same DB instance)
Sharding = across multiple databases/nodes

Partitioning improves manageability and query performance. Sharding gives you real horizontal scale, but also adds operational pain such as rebalancing, consistency tradeoffs, harder migrations and so on and so forth (more on this in future).

3) Shard key

Your shard key is what decides where each row goes to! Good shard key:

evenly distributes traffic (no hotspots)
avoids cross-shard queries
matches your access pattern

Bad shard key = “one shard is dying while others are chilling” Classic hotspot: sharding by country, date, or anything super skewed.

4) Sharding strategies

A) Range-based

(like userId 1–1M on one, 1M–2M on other…) Easy to understand, but can create hotspots because certain ranges could be queried more than others.

B) Hash-based

Distributes well, but is less intuitive to inspect/debug.

C) Modulo hashing

hash(key) % N Simple, but painful when the number of shards changes… N here is the number of shards that you have.

D) Consistent hashing (ring)

Adding/removing nodes moves only a small portion of keys. Better for elastic scaling (usually with virtual nodes to balance load, we gon talk about this later on since it’s a topic of its own).

Although these concepts sound simple, sharding is rarely hard because of hashing itself. It becomes hard because of rebalancing, query patterns, bad shard keys, and the operational mistakes that only start hurting once you scale in production!

Sharding and Partitioning