Asyncio¶
Hermes includes provisional asyncio
coroutine function caching support.
Basically it’s the same API for Redis and Memcached backends.
import aiohttp
import hermes.backend.redis
cache = hermes.Hermes(
hermes.backend.redis.Backend,
ttl = 600,
host = 'localhost',
db = 1,
)
@cache(ttl = 365 * 24 * 3600, tags = ['pypi'])
async def getHash(version):
async with aiohttp.ClientSession() as session:
async with session.get('https://pypi.org/pypi/hermescache/json') as resp:
return (await resp.json())['releases'][version][0]['md5_digest']
print(await getHash('0.9.0'))
print(await getHash('0.8.0'))
print(await getHash('0.7.2'))
await getHash.invalidate('0.8.0')
cache.clean(['pypi']) # invalidate entries tagged 'pypi'
Note that Hermes.clean
is still synchronous. For any practical number of
tags it should be instant (i.e. it’s a single multi-key DEL
in Redis, though
the network, as always, is unreliable). cache.clean()
can be slow depending
on the backend (e.g. Redis FLUSHDB
is O(n) where n is the number of records
in the database; though note that if you are on Redis >= 4 you can run
flushdb(asynchronous = False)
manually). Anyhow, the same method can be run
in the default asyncio
thread pool [1] like this.
import asyncio
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, cache.clean, ['pypi'])
For the in-process backend there’s specialised implementation,
backend.inprocess.AsyncBackend
, which must be used instead of
backend.inprocess.Backend
for asyncio
-based applications.
Unlocking asynchronous operation¶
This section explores the problem and a (best-effort) solution that enables
caching of asyncio
coroutine functions by the same Hermes decorator API.
Problem¶
For synchronous Python functions Hermes’ assumptions and operation are straightforward.
import hermes.backend.redis
cache = hermes.Hermes(hermes.backend.redis.Backend)
@cache
def fn(a):
return a + 1
def main():
fn(1)
Python process runs 1 or more threads (and there can be multiple Python processes on different machines but that’s irrelevant here).
Cached function,
fn
, and all its callees are synchronous.Cache backend lock is synchronous (distributed or
threading.RLock
for the in-process backend).Cache backend load/save is synchronous.
For asynchronous Python functions, which usually have some synchronous callees besides asynchronous “awaitees” and both may make sense to cache, the assumptions and expected operation aren’t obvious.
import asyncio
import hermes.backend.redis
cache = hermes.Hermes(hermes.backend.redis.Backend)
@cache
def fn(a):
return a + 1
@cache
async def coro(a):
await asyncio.sleep(fn(1))
return a + 2
async def main():
await coro(1)
Python process runs 1 thread with 1
asyncio
IO loop (multi-loop case is out of the scope).Cached coroutine may await other cached coroutines and call other cached functions.
Distributed locks should be acquired asynchronously (otherwise IO loop will be blocked for too long). In-process locks must be coroutine-aware because thread locks are obviously useless for a single-threaded program.
Remote cache backend load/save should be asynchronous. In-process ones may remain synchronous.
Hence, the problem revolves around adapting Hermes’ API to fit the mixed (a)synchronous execution model. The changes should be kept to minimum because asynchronous operation is not the main goal of the library, it must not complicate existing API for synchronous use-cases, and not become a maintenance burden.
Solution¶
For remote backends the chosen solution is relative straightforward and is roughly the following:
Introduce new cache point
CachedCoro
. Coroutine functions are wrapped in it. Its__call__
andinvalidate
are coroutines.The new cache point uses the default
asyncio
thread pool [1] to run existing synchronous remote backend save/load and locking API.Cached
wraps synchronous callables as is. I.e. there’s blocking IO to the backend (e.g. Redis). It may look like a bad design because the IO may block the IO loop, but if the callable is marked for caching means that it’s actually expected to run significantly slower than normal roundtrip to Redis. Hence for slow synchronous functions that significantly benefit from caching it’s still feasible (although they would benefit from being turned into coroutines awaiting on a thread/process pool even more). But caching a synchronous function in an asynchronous application should be done with caution (e.g. examining Redissocket_connect_timeout
andsocket_timeout
, locking effect on multi-process application, etc).
This solution doesn’t require any change to the public API. In-process backend
however needs a specialised asynchronous version of the backend,
backend.inprocess.AsyncBackend
, that provides coroutine-aware
locking mechanism.