With over 600 million images to index for downstream services like search and personalization, we faced a fundamental bottleneck in how our system accessed and served images. This post dives into the thoughts that went in when building a high-throughput, sharded image caching layer on top of NGINX to unlock 5x the read performance—while keeping things simple and resilient.
🚧 Problem
We needed to solve several pressing issues:
- Throughput limitations in our existing image storage infrastructure
- Latency-sensitive workflows, like embedding extraction and indexing
- Massive scale: 600M+ images, frequently accessed by downstream services
Traditional storage approaches introduced unacceptable bottlenecks when embedding or retrieving images at scale.
📋 Technical Requirements
To meet the demands, our caching layer needed:
- Configurable and pluggable cache logic
- High throughput for concurrent image reads
- Scalability via sharding across nodes
- Fault tolerance and even data distribution
⚙️ Implementation: The Image Cache
We built a distributed caching layer on top of NGINX, known for its:
- High-speed I/O handling
- Built-in caching module
- Simplicity in defining key-value semantics (via URL routes)
Key Design Points:
- Local file system storage → Simplicity and performance, but I/O intensive
- Multi-node architecture with consistent hashing → Ensured even data distribution and fault tolerance
- NGINX as both reverse proxy and cache manager → No extra runtime components; lightweight
Note:
We’ve published a detailed step-by-step guide on the setup:
TECH.ZEALOT: Creating an NGINX Image Cache
✅ Outcome
- 5x improvement in read throughput
- Even data distribution via consistent hashing
- Automatic failure handling—nodes could be added/removed without rebalancing all data
🔍 Challenges & Improvements
While the system works well in production, there are still some limitations:
- Heavy disk I/O due to local file system access
- Replication is not built-in and needs orchestration
- Cache introspection is limited—it’s hard to query which images are present
- Hot key pressure can overload nodes → Mitigated via more nodes and randomized hashing seeds
🧠 Other Considerations
- Memory-layer caching (e.g., Redis, Memcached) for ultra-hot images
- Hybrid edge-caching using CDNs or object storage with smart invalidation
- Cache analytics dashboard for visibility and tuning

Leave a comment