Cover image
Data Science and Databases
8 minute read

Serve Map Clusters 50x Faster Using Smarter Caching

Serving map clusters to a mobile app can cause a significant performance bottleneck. Fortunately, it’s a problem that can be solved with this caching strategy.

Map components that feature location markers are common in mobile apps today. For example, the Airbnb app prominently features such markers, fetched from a web service, to represent available properties on a map.

To ensure that the volume of fetched data does not become unmanageable as the number of markers grows, the server groups those markers together before sending them to the client. A map cluster is a specialized marker whose position equals the average position of the markers it subsumes. It is labeled with the number of markers it represents.

Still, serving clusters can create performance bottlenecks because a web service must retrieve every single marker within a given geographical region from the database. Fortunately, this issue can be resolved with a caching strategy. To better understand how to design and maintain a cache, let’s look at an example map API endpoint I built for the Playsports app.

A Scaling Problem and a Naive Solution

In the Playsports map, each marker represents a sports facility. The map’s API endpoint needs to return a list of markers and marker clusters, given a zoom level and a bounding box.

A 2D world map with several thumbtacks, several circles with numbers in them, and a square, dotted boundary encompassing Europe and half of Africa.
A bounding box, markers, and clusters on a map

If the number of markers is small enough, we can load all markers in the bounding box from the database, cluster as necessary, and return the resultant markers and clusters to the client.

At the outset, the maximum number of markers in any reachable bounding box in Playsports was ~400, resulting in an endpoint speed of ~0.5 seconds. Implementing the naive solution was sufficient for this use case.

As the number of markers grew, however, the mechanism’s inefficiency became obvious. After we added ~10,000 new sports facility markers, the endpoint speed slowed to ~4.3 seconds under some zoom levels. This rate is far below the one-second duration that is generally considered to be the maximum acceptable delay for a user action in a mobile app.

To better understand the naive solution’s inefficiencies, let’s analyze it in the context of the marker query:

  1. From the database, retrieve all markers in the bounding box. For most apps, this step needs to include fetching marker details beyond latitude/longitude positioning. For example, Airbnb’s markers include the price, whether the user has already viewed the property, and whether they have marked it as a favorite.
  2. On the retrieved markers, run a map-clustering algorithm that incorporates the zoom level.
  3. Return clusters and (detailed) markers to the client.

As the number of markers increases, performance deteriorates in Steps 1 and 2:

  • When the bounding box is large enough, and when more than 10,000 markers require a detail lookup via a JOIN, the database query slows down dramatically (by 1 to 2 seconds).
  • Feeding 10,000 items to the map-clustering algorithm takes another ~250 milliseconds.

Assuming a constant window size, when a bounding box is relatively large, the zoom level is usually low (i.e., zoomed out far). Under low zoom levels, results tend to favor clusters to avoid visual crowding. Thus, the greatest potential for optimization lies in the first step of the solution, where thousands of individual markers are loaded. We don’t need most of these markers in the result. We only need each of them to count toward a cluster.

Architecting the Optimized Solution

The naive solution takes significantly longer to complete a worst-case query: a maximum bounding box in a marker-dense region. In contrast, the optimized solution would take just 73 ms, representing a ~58x speedup. From a high level, it looks like this:

  1. The Caching Strategy. Retrieve markers and clusters in a bounding box from a zoom level-specific cache.
  2. Additional Marker Detail (Optional). Enhance markers with a payload fetched from the database.
  3. Returning the Result. Return markers and clusters to the client.

The main complexity lies in the architecture of the cache (i.e., the first step).

Step 1: The Caching Strategy

This main step consists of six components:

Technology

Redis is a fast, in-memory database that is commonly used as a cache. Its built-in geospatial indexing uses the Geohash algorithm for the efficient storage and retrieval of latitude-longitude points, which is precisely what we need for our markers.

A Cache for Each Zoom Level

The degree of map clustering (whether single markers or clusters are returned) is determined by the zoom level passed to the endpoint.

  • For high zoom levels (i.e., zoomed in close), the result will favor individual markers, except in marker-dense regions.
  • For low zoom levels (i.e., zoomed out far), the result will favor clusters, except in marker-sparse regions.

Google Maps supports zoom levels from zero to a maximum of 20, depending on the area. (Ranges supported by other map services are similar. For instance, Mapbox uses zoom levels from zero to a maximum of 23.) Since these zoom levels are also integer values, we can simply create a separate cache for each zoom level.

To support all zoom levels in Google Maps—except level zero, which is too far zoomed out—we will create 20 distinct caches. Each cache will store all markers and clusters for the entire map, as clustered for the zoom level it represents.

A 2D world map with one numbered circle in North America, one in Asia, and one in Africa. At right is an indicator that this cache is for Zoom Level 1. A second 2D world map is peppered with dozens of thumbtacks. At right is an indicator that this cache is for Zoom Level 20.
Comparison of two zoom-level views

Cache Data Types

Each cache would store clusters and individual markers. For either type of entry, we’ll need to populate several fields:

Field Name Note
Type Cluster or marker
Latitude and longitude We duplicate Redis’ internal geospatial storage for convenience.
ID
(for markers only)
In Step 2, we can use this value to fetch details beyond location, such as user interaction history.
Quantity of subsumed markers
(for clusters only)
In Step 2, we can fetch aggregate data (e.g., the number of unviewed markers) for these too.

However, Redis allows users to store only the location, plus a single string, as the value in a geospatial set. To get around this restriction, we serialize the above fields in a JSON string. We then use this string as the value in Redis. The maximum size of both keys and values in Redis is 512 MB, which is more than enough for this use case.

Filling the Caches

To fill the caches, we retrieve all of the markers from the database. For each zoom level, we execute the map-clustering algorithm. We use Redis’ GEOADD to insert the resulting markers and clusters into the cache of the corresponding zoom level, and pass along the latitude and longitude, plus the previously described JSON string.

Running the map-clustering algorithm on the whole map at this stage (rather than on a bounding box from the user, on demand) might theoretically produce some edge differences in cluster placement, but the overall user experience will remain the same.

Querying the Cache

For an incoming request, we pass the given bounding box to the Redis GEOSEARCH command, which queries the cache of the given zoom level to retrieve marker and cluster candidates in the area.

Designing a Cache Refresh Plan

A 20-level cache refresh is expensive, so it’s best to refresh as infrequently as your project requirements allow. For example, the addition or removal of a sports facility in Playsports only marks the cache as stale; an hourly cron job then refreshes the cache, if needed. More on this in the Cache Staleness section.

Step 2: Additional Marker Detail (Optional)

At this point, most apps will need to fetch details based on individual marker IDs. We could make this detail part of the stringified JSON values in the cache, but in many apps, marker details are user-specific. Since there’s a single, shared cache for all users, it’s not possible to store these additional fields there.

Our optimized solution takes the IDs of the individual markers from the cache result and fetches their details from the database. Now we’re only looking up the individual markers that are left after clustering. There won’t be too many of these when the map is zoomed out (because we’ll have mostly clusters) nor when zoomed in (because the area will be small).

In contrast, the naive solution looks up all markers in the bounding box (and their details) before clustering. Thus, this step—optional, but for many apps, crucial—now runs much faster.

Step 3: Returning the Result

With the clusters built and the individual markers enhanced, we can now deliver these to the client. Aside from some inconsequential edge cases, the resulting data is nearly identical to the naive solution—but we’re able to deliver it much faster.

Further Considerations

Now let’s look at two additional factors:

Resource Needs

Let’s assume that an app’s map contains N markers total. As there are up to 20 zoom levels, we would need to store, at most, 20N cache items. In practice, however, the actual number of cache items is often much lower due to clustering, especially in the lower zoom levels. In fact, the total number of cache items distributed among all of Playsports’ caches is only ~2N.

Further, if we assume that the length of a cache value (the stringified JSON) is ~250 characters (a reasonable assumption, at least for Playsports) and the string size is 1 byte per character, then the amount of cache storage needed for the JSON values would be approximately 2 * N * 250 bytes.

To this figure we add Redis’ internal data structures for the sorted sets used by GEOADD. Redis uses 85 MB of memory for 1 million small key-value pairs, so we can assume Redis internals account for less than 100 bytes per cache item. That means we could use a 1 GB-RAM Redis instance to support as many as 1.4 million markers. Even in the unlikely scenario of the markers being evenly spread out across the entire map, resulting in the number of cache items nearing 20N, N could still go up to ~140,000. Since a Redis instance can handle more than 4 billion keys (232, to be exact), this is not a limiting factor.

Cache Staleness

Depending on the use case, it may not be sufficient to refresh the cache only periodically. In such cases, we could use a rate-limited task queue. It would reduce cache staleness, while still limiting the number of cache refreshes, and thus the load on the system.

After each database update, we would queue a cache refresh job. This queue would limit the number of tasks to M per hour. This compromise allows for faster-than-hourly updates while maintaining a relatively low load on the system (depending on M).

Caching Strategies Outweigh Map-clustering Algorithms

The optimized solution for Playsports—more than 50 times faster than the naive solution—has worked well. It should continue to work well, until we reach 1.4 million markers or more than 100 times the existing data.

For most map-based web service requests, some form of precalculation is needed to allow for scalability. The type of technology to be used, as well as the specific design, would depend on business requirements. Cache staleness needs, marker augmentation, and the number of markers are important characteristics to take into account when designing a solution.


Further Reading on the Toptal Engineering Blog:

Understanding the basics

Map clustering, or marker clustering, simplifies maps that would otherwise be overcrowded with markers.

Marker clustering is done server-side. Visually, each area where markers are crowded is replaced with a map cluster, usually a circle containing the number of individual markers it represents.

A cached map is a map that has had its clusters precalculated at each zoom level for faster access.