如何确保与非祖先查询的隔离

Question

我想使用 ndb 创建用户，如下所示：

def create_user(self, google_id, ....):
  user_keys = UserInformation.query(UserInformation.google_id == google_id ).fetch(keys_only=True)

  if user_keys: # check whether user exist.
    # already created
    ...(SNIP)...
  else:
    # create new user entity.
    UserInformation(
      # primary key is incompletekey
      google_id = google_id,
      facebook_id = None,
      twitter_id = None,
      name = 
      ...(SNIP)...
    ).put()

如果此函数同时被调用两次，则创建两个用户。（"Isolation" get() 和 put() 之间不保证）

所以，我在上面的函数中添加了@ndb.transactional。但是出现以下错误。

BadRequestError: Only ancestor queries are allowed inside transactions.

如何确保与非祖先查询的隔离？

Answer 1

ndb 库不允许 non-ancestor 事务内查询。因此，如果您使 create_user() 成为事务性的，则会出现上述错误，因为您在其中调用了 UserInformation.query()（没有祖先）。

如果你真的想这样做，你必须通过指定一个共同的祖先并将你的查询设为祖先，将所有 UserInformation 实体放在同一个实体组中。但这对性能有影响，请参阅 Ancestor relation in datastore。

否则，即使您将函数拆分为 2 个函数，一个 non-transactional 进行查询，然后是一个仅创建用户的事务性查询 - 这将避免错误 - 您最终仍将面临数据存储一致性，这实际上是你问题的根本原因：查询的结果可能不是立即 return最近添加的实体，因为它对应的索引需要一些时间要更新的查询。这导致为同一用户创建重复实体的空间。参见 Balancing Strong and Eventual Consistency with Google Cloud Datastore。

一种可能的方法是检查 later/periodically 是否有重复项并将其删除（最终将内部信息合并为一个实体）。 And/or 将用户创建标记为"in progress"，记录新创建的实体的键并继续查询，直到该键出现在查询结果中，最后将实体创建标记为"done"（您可能没有时间在同一请求中执行此操作）。

另一种方法是（如果可能的话）根据用户信息确定一种算法来获取（唯一）密钥，并只检查是否存在具有此类密钥的实体，而不是进行查询。键查找是高度一致的，可以在事务内部完成，这样可以解决重复问题。例如，您可以使用 google_id 作为密钥 ID。举个例子，因为这也不理想：您可能有没有 google_id 的用户，用户可能想更改他们的 google_id 而不会丢失其他信息，等等。也许还可以跟踪正在创建的用户会话信息以防止重复尝试在同一会话中创建同一用户（但这对不同会话的尝试没有帮助）。

Answer 2

对于您的用例，也许您可以使用 ndb 模型的 get_or_insert 方法，该方法根据 API docs:

Transactionally retrieves an existing entity or creates a new one.

所以你可以这样做：

user = UserInformation.get_or_insert(*args, **kwargs)

无需冒创建新用户的风险。

完整文档：

classmethod get_or_insert(*args, **kwds)source Transactionally retrieves an existing entity or creates a new one.

Positional Args: name: Key name to retrieve or create.

Keyword Arguments

namespace – Optional namespace. app – Optional app ID.

parent – Parent entity key, if any.

context_options – ContextOptions object (not keyword args!) or None.

**kwds – Keyword arguments to pass to the constructor of the model class if an instance for the specified key name does not already exist. If an instance with the supplied key_name and parent already exists, these arguments will be discarded. Returns Existing instance of Model class with the specified key name and parent or a new one that has just been created.

如何确保与非祖先查询的隔离

How to ensure isolation with non-ancestor query

google-app-engine

app-engine-ndb

google-cloud-datastore