批量更新 DocumentDB 中的数据

Question

我希望将具有默认值的属性添加到我通过 SELECT 查询检索的一组文档（如果它们不包含任何值）。

我分两部分考虑：

SELECT * 来自 c 文章 WHERE article.details.locale = 'en-us'

我想查找 article.details.x 不存在的所有文章。

添加属性, article.details.x = true

我希望 Azure 门户可以支持此 EXEC 命令，这样我就不必为运行此命令创建一次迁移工具，但我在门户中找不到此选项。这可能吗？

Answer 1

DocumentDB 无法在单个查询中更新一堆文档。但是，门户网站确实有一个脚本资源管理器，允许您针对单个集合编写和执行存储过程。 Here 是一个示例存储过程，它结合了查询和 replaceDocument 命令来更新一些文档，您可以将这些文档用作编写您自己的文档的起点。要记住的一个问题是 DocumentDB 不允许存储过程运行超过 5 秒（有一些缓冲区）。因此，您可能必须运行您的 sproc 多次，如果不能在 5 秒内完成，请跟踪您已经完成的工作运行。在您的查询中使用 IS_DEFINED(collection.field.subfield) != true （感谢@cnaegle），然后是定义该字段（或删除该文档）的文档替换应该允许您运行 sproc 多次。

如果您不想编写存储过程，最简单的做法是使用 DocumentDB 数据迁移工具导出数据库。将其导入 Excel 以进行操作或编写脚本来进行操作。然后使用数据迁移工具再次上传。

Answer 2

您可以使用 Azure Document DB Studio 作为创建和执行存储过程的前端。可以找到here。它非常容易设置和使用。

我根据您的示例模拟了一个存储过程：

function updateArticlesDetailsX() {

   var collection = getContext().getCollection();
   var collectionLink = collection.getSelfLink();
   var response = getContext().getResponse();
   var docCount = 0;
   var counter = 0;

   tryQueryAndUpdate();

   function tryQueryAndUpdate(continuation) {
        var query = {
            query: "select * from root r where IS_DEFINED(r.details.x) != true"
        };

        var requestOptions = {
            continuation: continuation
        };

        var isAccepted =
            collection
            .queryDocuments(collectionLink,
                            query,
                            requestOptions,
                            function queryCallback(err, documents, responseOptions) {
                                     if (err) throw err;
                                     if (documents.length > 0) {
                                        // If at least one document is found, update it.
                                        docCount = documents.length;
                                        for (var i=0; i<docCount; i++){
                                            tryUpdate(documents[i]);
                                        }
                                        response.setBody("Updated " + docCount + " documents");
                                      }
                                      else if (responseOptions.continuation) {
                                          // Else if the query came back empty, but with a continuation token; 
                                          // repeat the query w/ the token.
                                        tryQueryAndUpdate(responseOptions.continuation);
                                      } else {
                                             throw new Error("Document not found.");
                                             }
                            });

        if (!isAccepted) {
            throw new Error("The stored procedure timed out");
        }
    }

    function tryUpdate(document) {
        //Optimistic concurrency control via HTTP ETag.
        var requestOptions = { etag: document._etag };

        //Update statement goes here:
        document.details.x = "some new value";

        var isAccepted = collection
                         .replaceDocument(document._self,
                                          document,
                                          requestOptions,
                                          function replaceCallback(err, updatedDocument, responseOptions) {
                                                   if (err) throw err;
                                                   counter++;
                                           });

        // If we hit execution bounds - throw an exception.
        if (!isAccepted) {
            throw new Error("The stored procedure timed out");
        }
    }
}

我在 GitHub 上从 Andrew Liu 那里得到了这段代码的粗略大纲。

这个大纲应该和你需要做的很接近。

批量更新 DocumentDB 中的数据

Bulk updating data in DocumentDB

azure-cosmosdb