如何删除mongodb中分组返回的文件?

How to delete the documents returned by group in mongodb?

我是 mongodb 初学者,正在处理作业问题,数据集如下所示

{ "_id" : { "$oid" : "50906d7fa3c412bb040eb577" }, "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb578" }, "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb579" }, "student_id" : 0, "type" : "homework", "score" : 14.8504576811645 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb57a" }, "student_id" : 0, "type" : "homework", "score" : 63.98402553675503 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb57b" }, "student_id" : 1, "type" : "exam", "score" : 74.20010837299897 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb57c" }, "student_id" : 1, "type" : "quiz", "score" : 96.76851542258362 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb57d" }, "student_id" : 1, "type" : "homework", "score" : 21.33260810416115 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb57e" }, "student_id" : 1, "type" : "homework", "score" : 44.31667452616328 }

作为问题的一部分,我必须为每个学生删除分数最低的 'homework' 文档。这是我的策略

聚合管道
1:先过滤所有带type:homeworks
的文档 2:排序student_id,得分
3:在student_id上做一组,找到第一个元素

这将给我所有得分最低的文档,

但是如何从原始数据集中删除这些元素?任何指导或提示?

使用聚合的游标结果通过游标的 forEach() method and then remove each document from the collection using the _id as the query in the remove() 方法遍历文档。像这样:

var cursor = db.grades.aggregate(pipeline);
cursor.forEach(function (doc){
    db.grades.remove({"_id": doc._id});
});

另一种方法是使用 map() 方法创建文档的 _id 数组,并删除如下文档:

var cursor = db.grades.aggregate(pipeline),
    ids = cursor.map(function (doc) { return doc._id; });
db.grades.remove({"_id": { "$in": ids }});

-- 更新 --

对于大型删除操作,将要保留的文档复制到新集合中,然后使用 drop() on the original collection. To copy the essential documents your aggregation pipeline needs to return the documents without the lowest homework doc and copy them to another collection using the $out 运算符作为最后的管道阶段可能会更有效。考虑以下聚合管道:

db.grades.aggregate([    
    {
        '$group':{
            '_id': {
                "student_id": "$student_id",
                "type": "$type"
            },
            'lowest_score': { "$min": '$score'},
            'data': {
                '$push': '$$ROOT'
            }
         }
    },    
    {
        "$unwind": "$data"
    },
    {
        "$project": {
            "_id": "$data._id",
            "student_id" : "$data.student_id",
            "type" : "$data.type",
            "score" : "$data.score",
            'lowest_score': 1,            
            "isHomeworkLowest": {
                "$cond": [
                    { 
                        "$and": [
                            { "$eq": [ "$_id.type", "homework" ] },
                            { "$eq": [ "$data.score", "$lowest_score" ] }
                        ] 
                    },
                    true,
                    false
                ]
            }
        }
    },
    {
        "$match": {"isHomeworkLowest" : false}
    },
    {
        "$project": {           
            "student_id": 1,
            "type": 1,
            "score": 1
        }
    },
    {
        "$out": "new_grades"
    }
])

然后您可以在其中删除旧集合 db.grades.drop() 然后在 db.new_grades.find()

上查询

我认为这是 MongoDB 大学为 Java 开发人员提供的 MongoDB 作业的数据库部分。要求是删除每个学生的最低分数。反正我是这样解决的。希望对您有所帮助。您还可以从我的 github link(下面提供)

克隆我的代码
public class Homework2Week2 {

public static void main(String[] args) {
    // TODO Auto-generated method stub
    // Here the the documentation is used for mongo-jva-driver-3.2.2.jar
    /*If you want to use different versionof  mongo-jva-driver 
      then you have look for that version specificatios.*/
    MongoClient mongoClient = new MongoClient();
    // get handle to "students" database
    MongoDatabase database = mongoClient.getDatabase("students");
    // get a handle to the "grades" collection
    MongoCollection<Document> collection = database.getCollection("grades");
    /*
     * Write a program in the language of your choice that will remove the grade of type "homework" with the lowest score for each student from the dataset in the handout. 
     * Since each document is one grade, it should remove one document per student. 
     * This will use the same data set as the last problem, but if you don't have it, you can download and re-import.
     * The dataset contains 4 scores each for 200 students.
     * First, letâs confirm your data is intact; the number of documents should be 800.

     *Hint/spoiler: If you select homework grade-documents, sort by student
      and then by score, you can iterate through and find the lowest score
      for each student by noticing a change in student id. As you notice
      that change of student_id, remove the document.
     */
    MongoCursor<Document> cursor = collection.find(eq("type", "homework")).sort(new Document("student_id", 1).append("score", 1)).iterator();
    int curStudentId = -1;
    try
    {
    while (cursor.hasNext()) {
        Document doc = cursor.next();
        int studentId=(int) doc.get("student_id");
        if (studentId != curStudentId) {
            collection.deleteMany(doc);
            curStudentId = studentId;
        }
    }
    }finally {
        //Close cursor
        cursor.close();
    }   
    //Close mongoClient
    mongoClient.close();
}

}

在我的Github account I have the complete project code. If anyone want's you can try from this link.

int studentId=(int) doc.get("student_id");

给出一个转换类型error.Can你再检查一下?

据我所知,我们可以做如下。

int studentId= Integer.valueOf(doc.get("student_id").toString());

db.grades.aggregate( [ 
                            { 
                                $match:{type:'homework'}
                            }, 
                            { $group: 
                                 { _id: {student_id:"$student_id",type:'$type'},                                   
                                   score: { $max: "$score" } 
                                 } 
                            } 
                            ]).forEach(function(doc){
db.grades.remove({'student_id':doc._id.student_id,'score':doc.score})

})

此问题是 MongoDB 大学 MongoDB 开发人员课程的 M101P 的一部分。这里的要求是:-

从数据集中删除每个学生得分最低的 "homework" 类型的成绩。由于每个文档都是一个年级,因此它应该为每个学生删除一个文档。

所以这意味着从每个 student_id 中存在 4 个 'type',其中两个 'type' 是 'homework'。我们必须从两个 'type':'homework' 文档中删除最低分数。

pymongo中up和运行代码如下:-

import pymongo
import sys

//establish a connection to database

connection=pymongo.MongoClient('localhost',27017)
//Get a handle to students database

db=connection.students
mycol=db.grades

def remove_documents():
        pipe=[
        {'$match':{'type':'homework'}},
        {'$group':{'_id':'$student_id','minscore':  {'$min':'$score'}}}
        ,{'$sort':{'_id':1}}
             ]
        result_cursor=mycol.aggregate(pipeline=pipe)
        counter=0

        for i in result_cursor:
            query={'student_id':i['_id'],'score':i['minscore']}
            del_record=mycol.delete_one(query)
            if (int(del_record.deleted_count) > 0):
                    counter+=1
            else:
                    continue
        print(counter)
remove_documents()

终端输出:- $pythonremove_grade.py

200

Mongo 4.4 开始,$group 阶段有一个新的聚合运算符 $accumulator 允许在文档分组时自定义累积它们。

结合本例中使用的 $out 阶段,用聚合管道的结果替换原始集合(已从每个学生的最低分数中删除):

// > db.collection.find()
//     { "student_id" : 0, "type" : "exam",     "score" : 54.6535436362647  }
//     { "student_id" : 0, "type" : "homework", "score" : 14.8504576811645  }
//     { "student_id" : 0, "type" : "homework", "score" : 63.98402553675503 }
//     { "student_id" : 1, "type" : "homework", "score" : 21.33260810416115 }
//     { "student_id" : 1, "type" : "homework", "score" : 44.31667452616328 }
db.collection.aggregate(
  { $group: {
      _id: "$student_id",
      docs: { $accumulator: {
        accumulateArgs: ["$$ROOT"],
        init: function() { return []; },
        accumulate: function(docs, doc) { return docs.concat(doc); },
        merge: function(docs1, docs2) { return docs1.concat(docs2); },
        finalize: function(docs) {
          var min = Math.min(...docs.map(x => x.score));
          var i = docs.findIndex((doc) => doc.score == min);
          docs.splice(i, 1);
          return docs;
        },
        lang: "js"
      }}
  }},
  { $unwind: "$docs" },
  { $replaceWith: "$docs" },
  { $out: "collection" }
)
// > db.collection.find()
//     { "student_id" : 0, "type" : "exam",     "score" : 54.6535436362647  }
//     { "student_id" : 0, "type" : "homework", "score" : 63.98402553675503 }
//     { "student_id" : 1, "type" : "homework", "score" : 44.31667452616328 }

这个:

  • $groups 文档 student_id 并将它们累积为从得分最低的文档中剥离的数组:

    • accumulateArgs 是累积函数使用的字段组合(或者在我们的例子中是整个文档 $$ROOT)。

    • 每个原始累加数组init初始化为一个空数组。

    • 文档只是 concatanated(accumulatemerge

    • 最后,一旦所有文档都被分组,finalize步骤允许找到得分最低的分组文档,以便将其删除。

    • 在此阶段结束时,流水线文档如下所示:

      {
        "_id" : 0,
        "docs" : [
          { "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 },
          { "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 },
          { "student_id" : 0, "type" : "homework", "score" : 63.98402553675503 }
        ]
      }
      ...
      
  • $unwinds 分组文档的累积字段以展平分组文档的数组,并返回类似:

    { "_id" : 0, "docs" : { "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 } }
    { "_id" : 0, "docs" : { "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 } }
    { "_id" : 0, "docs" : { "student_id" : 0, "type" : "homework", "score" : 63.98402553675503 } }
    ...
    
  • $replaceWith 将每个文档中的所有现有字段与字段的内容累加起来,以便找回原来的格式。在这个阶段结束时,我们有类似的东西:

    { "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 }
    { "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 }
    { "student_id" : 0, "type" : "homework", "score" : 63.98402553675503 }
    ...
    
  • $out 将聚合管道的结果插入同一个集合中。注意$out方便的替换了指定集合的​​内容,使得这个解决方案成为可能。