Scaling Image Throughput with an NGINX-Based Cache

With over 600 million images to index for downstream services like search and personalization, we faced a fundamental bottleneck in how our system accessed and served images. This post dives into the thoughts that went in when building a high-throughput, sharded image caching layer on top of NGINX to unlock 5x the read performance—while keeping things simple and resilient.

🚧 Problem

We needed to solve several pressing issues:

Throughput limitations in our existing image storage infrastructure
Latency-sensitive workflows, like embedding extraction and indexing
Massive scale: 600M+ images, frequently accessed by downstream services

Traditional storage approaches introduced unacceptable bottlenecks when embedding or retrieving images at scale.

📋 Technical Requirements

To meet the demands, our caching layer needed:

Configurable and pluggable cache logic
High throughput for concurrent image reads
Scalability via sharding across nodes
Fault tolerance and even data distribution

⚙️ Implementation: The Image Cache

We built a distributed caching layer on top of NGINX, known for its:

High-speed I/O handling
Built-in caching module
Simplicity in defining key-value semantics (via URL routes)

Key Design Points:

Local file system storage → Simplicity and performance, but I/O intensive
Multi-node architecture with consistent hashing → Ensured even data distribution and fault tolerance
NGINX as both reverse proxy and cache manager → No extra runtime components; lightweight

Note:

We’ve published a detailed step-by-step guide on the setup:

TECH.ZEALOT: Creating an NGINX Image Cache

✅ Outcome

5x improvement in read throughput
Even data distribution via consistent hashing
Automatic failure handling—nodes could be added/removed without rebalancing all data

🔍 Challenges & Improvements

While the system works well in production, there are still some limitations:

Heavy disk I/O due to local file system access
Replication is not built-in and needs orchestration
Cache introspection is limited—it’s hard to query which images are present
Hot key pressure can overload nodes → Mitigated via more nodes and randomized hashing seeds

🧠 Other Considerations

Memory-layer caching (e.g., Redis, Memcached) for ultra-hot images
Hybrid edge-caching using CDNs or object storage with smart invalidation
Cache analytics dashboard for visibility and tuning

Tech.Zealot