Finding Keys in Redis with 100 Million Keys: An Elegant Approach

Redis is one of the most familiar middleware components for every backend engineer. It's lightweight, efficient, and flexible—making it the go-to caching solution in modern high-concurrency systems.

However, as your business scales, Redis's simple world becomes less simple. One day, you might receive what seems like a straightforward requirement:

"Our Redis instance has approximately 100 million keys. We need to extract the 100,000 keys that have the prefix user:profile:"

It seems trivial, but one wrong move could cause your entire Redis instance to completely freeze.

Starting with the "Most Intuitive" Approach

Many people's first instinct is:

KEYS user:profile:*

Right? This command runs blazingly fast in development environments—results appear in the blink of an eye.

But if you execute this in production, be prepared for your ops team to have a "conversation" with you.

❌ Why You Should NEVER Use KEYS in Production

Redis operates on a single-threaded model. The KEYS command traverses the entire keyspace (regardless of how many keys exist), and during execution, it completely blocks the main thread.

This means:

All requests (read/write/expiration) will be blocked
CPU usage spikes dramatically
Response times skyrocket
In severe cases, monitoring systems may detect it as a crash and trigger a restart

Bottom line: KEYS is fine in development, but executing it in production is like detonating a nuclear bomb.

Redis's Official "Safe Solution": SCAN

Since version 2.8, Redis has provided a safe alternative command: SCAN. The approach is simple: incremental iteration.

SCAN 0 MATCH user:profile:* COUNT 1000

This command does two things:

Returns a subset of matching keys
Returns a cursor to continue scanning from this position next time

When the cursor returns 0, the scan is complete. This means you can use a loop to continuously call SCAN until all matching keys are found.

✅ Advantages

Non-blocking, doesn't affect the main thread
Supports pattern matching
Rate can be controlled in batches
Can process results while scanning

⚠️ Considerations

Scanning is "approximately random", no order guaranteed
May return duplicate keys (requires deduplication)
Full database scan can still be time-consuming

Code Implementation: Safely Finding Target Keys

Theory is great, but nothing beats hands-on code. Below is a ready-to-use example using redis-py:

import redis

# Connect to Redis
r = redis.Redis(
    host='127.0.0.1',
    port=6379,
    decode_responses=True
)

cursor = 0
matched_keys = set()  # Use set for automatic deduplication

while True:
    # Incremental scan, max 10000 keys per iteration
    cursor, keys = r.scan(
        cursor=cursor,
        match='user:profile:*',
        count=10000
    )

    # Accumulate results
    matched_keys.update(keys)

    # Print progress after each batch
    print(f"Scanned batch, total found: {len(matched_keys)} keys")

    # Cursor returns 0 when scan is complete
    if cursor == 0:
        break

print(f"Scan complete! Total keys found: {len(matched_keys)}")

💡 Key Points

COUNT controls scan speed
set() handles deduplication automatically
You can add sleep() in the loop to control rate and prevent CPU saturation

Understanding SCAN Internals

Understanding how SCAN works helps you predict when it might slow down.

• Redis's internal key space is a hash table
• SCAN traverses hash slots segment by segment using cursors
• The number of keys returned per iteration is not fixed
• Matching happens during scanning, not through pre-filtering

Therefore:

The larger the total key count, the longer SCAN takes to traverse
Higher COUNT values increase single-iteration response time
If Redis experiences heavy writes, the cursor's view may slightly "drift"

This is why SCAN provides approximate consistency traversal, not a complete snapshot.

Architectural Optimization Strategies

If you frequently need to find keys by prefix, the real issue isn't "how to find them"—it's "why do you need to find them in the first place?" This suggests your data modeling might need optimization. Here are some more elegant solutions:

1️⃣ Maintain an Index Set for Specific Prefixes

When writing data, simultaneously maintain an index set:

SADD user:profile:index 1001
SADD user:profile:index 1002

Later, you simply need:

SMEMBERS user:profile:index

Instantly retrieve all corresponding keys without traversing the entire database.

💡 Advantages:

O(1) query complexity, exceptional performance
Non-blocking
Supports pagination with ZSET

⚠️ Considerations:

Index must be maintained during writes
Index may contain stale data, requires periodic validation

2️⃣ Database Sharding / Partitioned Storage

If a particular category of keys is especially large, you can partition by prefix:

Place different business prefixes in separate Redis instances
In Cluster mode, use hash tags for location (e.g., user:{profile}:id)

This significantly reduces the scan range and improves efficiency.

Final Thoughts

Many people understand Redis as simply "it's fast." But in real production environments, Redis's speed is more like a double-edged sword.

It allows you to effortlessly read and write hundreds of millions of records, but it can also bring your entire system to a standstill with a single command.

So, the next time you're tempted to use KEYS *, ask yourself:

"Am I in development, or am I in production?"

A mature engineer doesn't write the shortest command—they design the most stable system.