PyMongo 按多个键分组
PyMongo group by multiple keys
用PyMongo,一键分组似乎没问题:
results = collection.group(key={"scan_status":0}, condition={'date': {'$gte': startdate}}, initial={"count": 0}, reduce=reducer)
结果:
{u'count': 215339.0, u'scan_status': u'PENDING'} {u'count': 617263.0, u'scan_status': u'DONE'}
但是当我尝试按多个键进行分组时出现异常:
results = collection.group(key={"scan_status":0,"date":0}, condition={'date': {'$gte': startdate}}, initial={"count": 0}, reduce=reducer)
如何正确地按多个字段进行分组?
如果您尝试计算两个以上的键,那么虽然可以使用 .group()
,但更好的选择是通过 .aggregate()
.
这使用 "native code operators" 而不是 .group()
要求的 JavaScript 解释代码来执行与您尝试实现的基本相同的 "grouping" 操作。
特别是 $group
管道运算符:
result = collection.aggregate([
# Matchn the documents possible
{ "$match": { "date": { "$gte": startdate } } },
# Group the documents and "count" via $sum on the values
{ "$group": {
"_id": {
"scan_status": "$scan_status",
"date": "$date"
},
"count": { "$sum": 1 }
}}
])
事实上,您可能想要将 "date" 缩减为一个不同时期的东西。如:
result = collection.aggregate([
# Matchn the documents possible
{ "$match": { "date": { "$gte": startdate } } },
# Group the documents and "count" via $sum on the values
{ "$group": {
"_id": {
"scan_status": "$scan_status",
"date": {
"year": { "$year": "$date" },
"month": { "$month" "$date" },
"day": { "$dayOfMonth": "$date" }
}
},
"count": { "$sum": 1 }
}}
])
使用此处显示的 Date Aggregation Operators。
或者可能使用基本的 "date math":
import datetime
from datetime import date
result = collection.aggregate([
# Matchn the documents possible
{ "$match": { "date": { "$gte": startdate } } },
# Group the documents and "count" via $sum on the values
# use "epoch" "1970-01-01" as a base to convert to integer
{ "$group": {
"_id": {
"scan_status": "$scan_status",
"date": {
"$subtract": [
{ "$subtract": [ "$date", date.fromtimestamp(0) ] },
{ "$mod": [
{ "$subtract": [ "$date", date.fromtimestamp(0) ] },
1000 * 60 * 60 * 24
]}
]
}
},
"count": { "$sum": 1 }
}}
])
这将 return 来自 "epoch" 时间的整数值而不是复合值对象。
但是所有这些选项都比 .group()
好,因为它们使用本地编码例程并且执行它们的操作比您需要提供的 JavaScript 代码快得多。
用PyMongo,一键分组似乎没问题:
results = collection.group(key={"scan_status":0}, condition={'date': {'$gte': startdate}}, initial={"count": 0}, reduce=reducer)
结果:
{u'count': 215339.0, u'scan_status': u'PENDING'} {u'count': 617263.0, u'scan_status': u'DONE'}
但是当我尝试按多个键进行分组时出现异常:
results = collection.group(key={"scan_status":0,"date":0}, condition={'date': {'$gte': startdate}}, initial={"count": 0}, reduce=reducer)
如何正确地按多个字段进行分组?
如果您尝试计算两个以上的键,那么虽然可以使用 .group()
,但更好的选择是通过 .aggregate()
.
这使用 "native code operators" 而不是 .group()
要求的 JavaScript 解释代码来执行与您尝试实现的基本相同的 "grouping" 操作。
特别是 $group
管道运算符:
result = collection.aggregate([
# Matchn the documents possible
{ "$match": { "date": { "$gte": startdate } } },
# Group the documents and "count" via $sum on the values
{ "$group": {
"_id": {
"scan_status": "$scan_status",
"date": "$date"
},
"count": { "$sum": 1 }
}}
])
事实上,您可能想要将 "date" 缩减为一个不同时期的东西。如:
result = collection.aggregate([
# Matchn the documents possible
{ "$match": { "date": { "$gte": startdate } } },
# Group the documents and "count" via $sum on the values
{ "$group": {
"_id": {
"scan_status": "$scan_status",
"date": {
"year": { "$year": "$date" },
"month": { "$month" "$date" },
"day": { "$dayOfMonth": "$date" }
}
},
"count": { "$sum": 1 }
}}
])
使用此处显示的 Date Aggregation Operators。
或者可能使用基本的 "date math":
import datetime
from datetime import date
result = collection.aggregate([
# Matchn the documents possible
{ "$match": { "date": { "$gte": startdate } } },
# Group the documents and "count" via $sum on the values
# use "epoch" "1970-01-01" as a base to convert to integer
{ "$group": {
"_id": {
"scan_status": "$scan_status",
"date": {
"$subtract": [
{ "$subtract": [ "$date", date.fromtimestamp(0) ] },
{ "$mod": [
{ "$subtract": [ "$date", date.fromtimestamp(0) ] },
1000 * 60 * 60 * 24
]}
]
}
},
"count": { "$sum": 1 }
}}
])
这将 return 来自 "epoch" 时间的整数值而不是复合值对象。
但是所有这些选项都比 .group()
好,因为它们使用本地编码例程并且执行它们的操作比您需要提供的 JavaScript 代码快得多。