Design

Overview

First let’s look how basic caching works.

import hermes.backend.inprocess


cache = hermes.Hermes(hermes.backend.inprocess.Backend)

@cache
def foo(a, b):
  return a * b

foo(2, 2)
foo(2, 4)

print(cache.backend.dump())
#  {
#    'cache:entry:foo:bNapzUm_P8fjh8lhIWYgkg': 8,
#    'cache:entry:foo:JjwE0zQhMRt_5lfwNNPk1Q': 4
#                            ↑
#                      argument hash
#  }

Basically we have a key-value storage with O(1) complexity for set, get and delete. This means that the performance of the operations is constant and does not depend on the number of items stored. When a callable (e.g. function or method) is cached, the key is calculated per invocation from the callable itself and passed arguments. Callable’s return value is saved to the key. Next invocation we can use the saved value.

Tagging cache entries

“There are only two hard problems in Computer Science: cache invalidation and naming things.” — Phil Karlton

So it comes in a complex application. There’s a case that certain group of methods operate the same data and it’s impractical to invalidate individual entries. In particular, it often happens when a method returns complex values, spanning multiple entities. Cache tagging makes it possible to mark this group of method results with a tag and invalidate them all at once.

To keep invalidation fast here’s an implementation of Eric Florenzano’s idea that he explained in Tagging cache keys for O(1) batch invalidation [1]. Let’s look at this code.

import hermes.backend.inprocess


cache = hermes.Hermes(hermes.backend.inprocess.Backend)

@cache(tags = ('tag1', 'tag2'))
def foo(a, b):
  return a * b

foo(2, 2)

print(cache.backend.dump())
#  {
#    'cache:tag:tag1': 'SeL9JQMnT-_4jRN1nxeoeg',
#    'cache:tag:tag2': 'n_OXAKlSjiz5zK5iAEe7zA',
#    'cache:entry:foo:JjwE0zQhMRt_5lfwNNPk1Q:h_C-2mB3kWNNrA4r7byxNA': 4
#                                                      ↑
#                                                   tag hash
#  }

When we want to tag a cache entry, first we need to create its tags’ entries. Each tag is represented by its own entry. The value of a tag entry is set to a random value each time tag is created. Once all tag values exist, they are joined and hashed. The tag hash is added to the cache entry key.

When we want to invalidate tagged entries by a tag, we just need to remove the tag entry. Without any of tag values tag hash was created with, it is impossible to construct the entry key so the tagged cache entries become inaccessible, hence invalidated.

What happens with performance? Do all operations become O(n) where n is the number of entry tags? Actually, no. Since we rarely need more than a few dozens of tags, practically it is still O(1). Tag entry operations are batched so the implications on the number of network operations go as follows:

  • set – 3x backend calls (get + 2 * set) in worst case. Average is expected to be 2x when all used tag entries are created.

  • get – 2x backend calls.

  • delete – 2x backend calls.

Memory overhead consists of tag entries and stale cache entries as demonstrated below.

import hermes.backend.inprocess


cache = hermes.Hermes(hermes.backend.inprocess.Backend)

@cache(tags = ('tag1', 'tag2'))
def foo(a, b):
  return a * b

foo(2, 2)

print(cache.backend.dump())
#  {
#    'cache:tag:tag1': 'Z_t4-2QfjwvYUc_d75WJgw',
#    'cache:tag:tag2': 'M_2pg1-dfS_8OM1Dnd4mXw',
#    'cache:entry:foo:JjwE0zQhMRt_5lfwNNPk1Q:ZuT3KKj6gttXtU01-JSbwQ': 4
#  }

cache.clean(['tag1'])
foo(2, 2)

print(cache.backend.dump())
#  {
#      recreated tag entry
#             ↓
#    'cache:tag:tag1': 'TAuY_hhl0a22Nm7P_dWQRA',
#    'cache:tag:tag2': 'M_2pg1-dfS_8OM1Dnd4mXw',
#    'cache:entry:foo:JjwE0zQhMRt_5lfwNNPk1Q:St3ZHOzerXZxN0qkzXcLVw': 4,
#    'cache:entry:foo:JjwE0zQhMRt_5lfwNNPk1Q:ZuT3KKj6gttXtU01-JSbwQ': 4
#             ↑
#        garbage entry
#  }

Hence the TTLs should be chosen elaborately. With Redis backend it’s recommended to set maxmemory-policy [2] to volatile-lru.