Asyncio

Hermes includes provisional asyncio coroutine function caching support. Basically it’s the same API for Redis and Memcached backends.

import aiohttp
import hermes.backend.redis


cache = hermes.Hermes(
  hermes.backend.redis.Backend,
  ttl = 600,
  host = 'localhost',
  db = 1,
)

@cache(ttl = 365 * 24 * 3600, tags = ['pypi'])
async def getHash(version):
  async with aiohttp.ClientSession() as session:
    async with session.get('https://pypi.org/pypi/hermescache/json') as resp:
      return (await resp.json())['releases'][version][0]['md5_digest']

print(await getHash('0.9.0'))
print(await getHash('0.8.0'))
print(await getHash('0.7.2'))

await getHash.invalidate('0.8.0')
cache.clean(['pypi'])  # invalidate entries tagged 'pypi'

Note that Hermes.clean is still synchronous. For any practical number of tags it should be instant (i.e. it’s a single multi-key DEL in Redis, though the network, as always, is unreliable). cache.clean() can be slow depending on the backend (e.g. Redis FLUSHDB is O(n) where n is the number of records in the database; though note that if you are on Redis >= 4 you can run flushdb(asynchronous = False) manually). Anyhow, the same method can be run in the default asyncio thread pool [1] like this.

import asyncio

loop = asyncio.get_event_loop()
await loop.run_in_executor(None, cache.clean, ['pypi'])

For the in-process backend there’s specialised implementation, backend.inprocess.AsyncBackend, which must be used instead of backend.inprocess.Backend for asyncio-based applications.

Unlocking asynchronous operation

This section explores the problem and a (best-effort) solution that enables caching of asyncio coroutine functions by the same Hermes decorator API.

Problem

For synchronous Python functions Hermes’ assumptions and operation are straightforward.

import hermes.backend.redis


cache = hermes.Hermes(hermes.backend.redis.Backend)

@cache
def fn(a):
  return a + 1

def main():
  fn(1)
  1. Python process runs 1 or more threads (and there can be multiple Python processes on different machines but that’s irrelevant here).

  2. Cached function, fn, and all its callees are synchronous.

  3. Cache backend lock is synchronous (distributed or threading.RLock for the in-process backend).

  4. Cache backend load/save is synchronous.

For asynchronous Python functions, which usually have some synchronous callees besides asynchronous “awaitees” and both may make sense to cache, the assumptions and expected operation aren’t obvious.

import asyncio

import hermes.backend.redis


cache = hermes.Hermes(hermes.backend.redis.Backend)

@cache
def fn(a):
  return a + 1

@cache
async def coro(a):
  await asyncio.sleep(fn(1))
  return a + 2

async def main():
  await coro(1)
  1. Python process runs 1 thread with 1 asyncio IO loop (multi-loop case is out of the scope).

  2. Cached coroutine may await other cached coroutines and call other cached functions.

  3. Distributed locks should be acquired asynchronously (otherwise IO loop will be blocked for too long). In-process locks must be coroutine-aware because thread locks are obviously useless for a single-threaded program.

  4. Remote cache backend load/save should be asynchronous. In-process ones may remain synchronous.

Hence, the problem revolves around adapting Hermes’ API to fit the mixed (a)synchronous execution model. The changes should be kept to minimum because asynchronous operation is not the main goal of the library, it must not complicate existing API for synchronous use-cases, and not become a maintenance burden.

Solution

For remote backends the chosen solution is relative straightforward and is roughly the following:

  1. Introduce new cache point CachedCoro. Coroutine functions are wrapped in it. Its __call__ and invalidate are coroutines.

  2. The new cache point uses the default asyncio thread pool [1] to run existing synchronous remote backend save/load and locking API.

  3. Cached wraps synchronous callables as is. I.e. there’s blocking IO to the backend (e.g. Redis). It may look like a bad design because the IO may block the IO loop, but if the callable is marked for caching means that it’s actually expected to run significantly slower than normal roundtrip to Redis. Hence for slow synchronous functions that significantly benefit from caching it’s still feasible (although they would benefit from being turned into coroutines awaiting on a thread/process pool even more). But caching a synchronous function in an asynchronous application should be done with caution (e.g. examining Redis socket_connect_timeout and socket_timeout, locking effect on multi-process application, etc).

This solution doesn’t require any change to the public API. In-process backend however needs a specialised asynchronous version of the backend, backend.inprocess.AsyncBackend, that provides coroutine-aware locking mechanism.