Finding Keys in Redis with 100 Million Keys: An Elegant Approach
Article Summary
When your Redis instance holds 100 million keys and you need to extract 100,000 with a specific prefix, using the wrong command can bring your entire system to a halt. Learn the elegant, production-safe approach to key scanning in Redis.
Redis is one of the most familiar middleware components for every backend engineer. It's lightweight, efficient, and flexible—making it the go-to caching solution in modern high-concurrency systems.
However, as your business scales, Redis's simple world becomes less simple. One day, you might receive what seems like a straightforward requirement:
"Our Redis instance has approximately 100 million keys. We need to extract the 100,000 keys that have the prefix
user:profile:"
It seems trivial, but one wrong move could cause your entire Redis instance to completely freeze.
Starting with the "Most Intuitive" Approach
Many people's first instinct is:
KEYS user:profile:*
Right? This command runs blazingly fast in development environments—results appear in the blink of an eye.
But if you execute this in production, be prepared for your ops team to have a "conversation" with you.
❌ Why You Should NEVER Use KEYS in Production
Redis operates on a single-threaded model. The KEYS command traverses the entire keyspace (regardless of how many keys exist), and during execution, it completely blocks the main thread.
This means:
- All requests (read/write/expiration) will be blocked
- CPU usage spikes dramatically
- Response times skyrocket
- In severe cases, monitoring systems may detect it as a crash and trigger a restart
Bottom line: KEYS is fine in development, but executing it in production is like detonating a nuclear bomb.
Redis's Official "Safe Solution": SCAN
Since version 2.8, Redis has provided a safe alternative command: SCAN. The approach is simple: incremental iteration.
SCAN 0 MATCH user:profile:* COUNT 1000
This command does two things:
- Returns a subset of matching keys
- Returns a cursor to continue scanning from this position next time
When the cursor returns 0, the scan is complete. This means you can use a loop to continuously call SCAN until all matching keys are found.
✅ Advantages
- Non-blocking, doesn't affect the main thread
- Supports pattern matching
- Rate can be controlled in batches
- Can process results while scanning
⚠️ Considerations
- Scanning is "approximately random", no order guaranteed
- May return duplicate keys (requires deduplication)
- Full database scan can still be time-consuming
Code Implementation: Safely Finding Target Keys
Theory is great, but nothing beats hands-on code. Below is a ready-to-use example using redis-py:
import redis
# Connect to Redis
r = redis.Redis(
host='127.0.0.1',
port=6379,
decode_responses=True
)
cursor = 0
matched_keys = set() # Use set for automatic deduplication
while True:
# Incremental scan, max 10000 keys per iteration
cursor, keys = r.scan(
cursor=cursor,
match='user:profile:*',
count=10000
)
# Accumulate results
matched_keys.update(keys)
# Print progress after each batch
print(f"Scanned batch, total found: {len(matched_keys)} keys")
# Cursor returns 0 when scan is complete
if cursor == 0:
break
print(f"Scan complete! Total keys found: {len(matched_keys)}")
💡 Key Points
COUNTcontrols scan speedset()handles deduplication automatically- You can add
sleep()in the loop to control rate and prevent CPU saturation
Understanding SCAN Internals
Understanding how SCAN works helps you predict when it might slow down.
- • Redis's internal key space is a hash table
-
•
SCANtraverses hash slots segment by segment using cursors - • The number of keys returned per iteration is not fixed
- • Matching happens during scanning, not through pre-filtering
Therefore:
- The larger the total key count, the longer
SCANtakes to traverse - Higher
COUNTvalues increase single-iteration response time - If Redis experiences heavy writes, the cursor's view may slightly "drift"
This is why SCAN provides approximate consistency traversal, not a complete snapshot.
Architectural Optimization Strategies
If you frequently need to find keys by prefix, the real issue isn't "how to find them"—it's "why do you need to find them in the first place?" This suggests your data modeling might need optimization. Here are some more elegant solutions:
1️⃣ Maintain an Index Set for Specific Prefixes
When writing data, simultaneously maintain an index set:
SADD user:profile:index 1001
SADD user:profile:index 1002
Later, you simply need:
SMEMBERS user:profile:index
Instantly retrieve all corresponding keys without traversing the entire database.
💡 Advantages:
- O(1) query complexity, exceptional performance
- Non-blocking
- Supports pagination with ZSET
⚠️ Considerations:
- Index must be maintained during writes
- Index may contain stale data, requires periodic validation
2️⃣ Database Sharding / Partitioned Storage
If a particular category of keys is especially large, you can partition by prefix:
- Place different business prefixes in separate Redis instances
- In Cluster mode, use hash tags for location (e.g.,
user:{profile}:id)
This significantly reduces the scan range and improves efficiency.
Final Thoughts
Many people understand Redis as simply "it's fast." But in real production environments, Redis's speed is more like a double-edged sword.
It allows you to effortlessly read and write hundreds of millions of records, but it can also bring your entire system to a standstill with a single command.
So, the next time you're tempted to use KEYS *, ask yourself:
"Am I in development, or am I in production?"
A mature engineer doesn't write the shortest command—they design the most stable system.