对 `lru_cache` 装饰函数使用可变参数可能会带来什么困难?
What difficulties might arise from using mutable arguments to an `lru_cache` decorated function?
在评论中:Is there a decorator to simply cache function return values?
@gerrit 指出了使用可变但可散列的对象到带有 functools.lru_cache
装饰器的函数的问题:
If I pass a hashable, mutable argument, and change the value of the
object after the first call of the function, the second call will
return the changed, not the original, object. That is almost certainly
not what the user wants.
根据我的理解,假设 可变对象的 __hash__()
函数被手动定义为散列成员变量(而不只是使用对象的 id()
这是自定义对象的默认值),更改参数对象将更改哈希,因此,对 lru_cache
修饰函数的第二次调用不应使用缓存。
如果为可变参数正确定义了 __hash__()
函数,是否存在因对 lru_cache
修饰函数使用可变参数而导致的任何无人值守行为?
我的评论是 wrong/misleading,与 lru_cache
无关,但与任何创建更通用的缓存函数的尝试有关。
我需要一个缓存函数,该函数适用于输入和输出 NumPy
数组的函数,这些数组是可变的且不可散列的。因为 NumPy
数组不可散列,所以我无法使用 functools.lru_cache
。我最终写了这样的东西:
def mutable_cache(maxsize=10):
"""In-memory cache like functools.lru_cache but for any object
This is a re-implementation of functools.lru_cache. Unlike
functools.lru_cache, it works for any objects, mutable or not.
Therefore, it returns a copy and it is wrong if the mutable
object has changed! Use with caution!
If you call the *resulting* function with a keyword argument
'CLEAR_CACHE', the cache will be cleared. Otherwise, cache is rotated
when more than `maxsize` elements exist in the cache. Additionally,
if you call the resulting function with NO_CACHE=True, it doesn't
cache at all. Be careful with functions returning large objects.
Everything is kept in RAM!
Args:
maxsize (int): Maximum number of return values to be remembered.
Returns:
New function that has caching implemented.
"""
sentinel = object()
make_key = functools._make_key
def decorating_function(user_function):
cache = {}
cache_get = cache.get
keylist = [] # don't make it too long
def wrapper(*args, **kwds):
if kwds.get("CLEAR_CACHE"):
del kwds["CLEAR_CACHE"]
cache.clear()
keylist.clear()
if kwds.get("NO_CACHE"):
del kwds["NO_CACHE"]
return user_function(*args, **kwds)
elif "NO_CACHE" in kwds:
del kwds["NO_CACHE"]
key = str(args) + str(kwds)
result = cache_get(key, sentinel)
if result is not sentinel:
# make sure we return a copy of the result; when a = f();
# b = f(), users should reasonably expect that a is not b.
return copy.copy(result)
result = user_function(*args, **kwds)
cache[key] = result
keylist.append(key)
if len(keylist) > maxsize:
try:
del cache[keylist[0]]
del keylist[0]
except KeyError:
pass
return result
return functools.update_wrapper(wrapper, user_function)
return decorating_function
在我的第一个版本中,我省略了 copy.copy()
函数(实际上应该是 copy.deepcopy()
),如果我更改结果值然后调用缓存函数,这会导致错误。添加 copy.copy()
功能后,我意识到在某些情况下我占用了内存,主要是因为我的函数计算的是对象,而不是总内存使用量,这在 Python 中通常很重要(尽管如果限于 NumPy
数组应该很容易)。因此,我将 NO_CACHE
和 CLEAR_CACHE
关键字添加到结果函数中,它们的作用与名称所暗示的相同。
在编写并使用该函数后,我了解到 functools.lru_cache
仅适用于具有可散列输入参数的函数有不止一个充分的理由。任何需要使用可变参数的缓存函数的人都需要非常小心。
在评论中:Is there a decorator to simply cache function return values?
@gerrit 指出了使用可变但可散列的对象到带有 functools.lru_cache
装饰器的函数的问题:
If I pass a hashable, mutable argument, and change the value of the object after the first call of the function, the second call will return the changed, not the original, object. That is almost certainly not what the user wants.
根据我的理解,假设 可变对象的 __hash__()
函数被手动定义为散列成员变量(而不只是使用对象的 id()
这是自定义对象的默认值),更改参数对象将更改哈希,因此,对 lru_cache
修饰函数的第二次调用不应使用缓存。
如果为可变参数正确定义了 __hash__()
函数,是否存在因对 lru_cache
修饰函数使用可变参数而导致的任何无人值守行为?
我的评论是 wrong/misleading,与 lru_cache
无关,但与任何创建更通用的缓存函数的尝试有关。
我需要一个缓存函数,该函数适用于输入和输出 NumPy
数组的函数,这些数组是可变的且不可散列的。因为 NumPy
数组不可散列,所以我无法使用 functools.lru_cache
。我最终写了这样的东西:
def mutable_cache(maxsize=10):
"""In-memory cache like functools.lru_cache but for any object
This is a re-implementation of functools.lru_cache. Unlike
functools.lru_cache, it works for any objects, mutable or not.
Therefore, it returns a copy and it is wrong if the mutable
object has changed! Use with caution!
If you call the *resulting* function with a keyword argument
'CLEAR_CACHE', the cache will be cleared. Otherwise, cache is rotated
when more than `maxsize` elements exist in the cache. Additionally,
if you call the resulting function with NO_CACHE=True, it doesn't
cache at all. Be careful with functions returning large objects.
Everything is kept in RAM!
Args:
maxsize (int): Maximum number of return values to be remembered.
Returns:
New function that has caching implemented.
"""
sentinel = object()
make_key = functools._make_key
def decorating_function(user_function):
cache = {}
cache_get = cache.get
keylist = [] # don't make it too long
def wrapper(*args, **kwds):
if kwds.get("CLEAR_CACHE"):
del kwds["CLEAR_CACHE"]
cache.clear()
keylist.clear()
if kwds.get("NO_CACHE"):
del kwds["NO_CACHE"]
return user_function(*args, **kwds)
elif "NO_CACHE" in kwds:
del kwds["NO_CACHE"]
key = str(args) + str(kwds)
result = cache_get(key, sentinel)
if result is not sentinel:
# make sure we return a copy of the result; when a = f();
# b = f(), users should reasonably expect that a is not b.
return copy.copy(result)
result = user_function(*args, **kwds)
cache[key] = result
keylist.append(key)
if len(keylist) > maxsize:
try:
del cache[keylist[0]]
del keylist[0]
except KeyError:
pass
return result
return functools.update_wrapper(wrapper, user_function)
return decorating_function
在我的第一个版本中,我省略了 copy.copy()
函数(实际上应该是 copy.deepcopy()
),如果我更改结果值然后调用缓存函数,这会导致错误。添加 copy.copy()
功能后,我意识到在某些情况下我占用了内存,主要是因为我的函数计算的是对象,而不是总内存使用量,这在 Python 中通常很重要(尽管如果限于 NumPy
数组应该很容易)。因此,我将 NO_CACHE
和 CLEAR_CACHE
关键字添加到结果函数中,它们的作用与名称所暗示的相同。
在编写并使用该函数后,我了解到 functools.lru_cache
仅适用于具有可散列输入参数的函数有不止一个充分的理由。任何需要使用可变参数的缓存函数的人都需要非常小心。