使用 Mongo 按组计算多个不同的字段
count multiple distinct fields by group with Mongo
我有一个数据集看起来像
{"BrandId":"a","SessionId":100,"UserName":"tom"}
{"BrandId":"a","SessionId":200,"UserName":"tom"}
{"BrandId":"b","SessionId":300,"UserName":"mike"}
我想按 brandid 计算不同的会话和用户名组,示例 sql 如下:
select brandid,count_distinct(sessionid),count_distinct(username)
from data
group by brandid
我尝试编写 Mongo 数据库,我当前的代码如下,但它不起作用。有没有办法让它发挥作用?
db.logs.aggregate([
{$group:{
_id:{brand:"$BrandId",user:"$UserName",session:"$SessionId"},
count:{$sum:1}}},
{$group:{
_id:"$_id.brand",
users:{$sum:"$_id.user"},
sessions:{$sum:"$_id.session"}
}}
])
对于特定示例,预期计数为
{"BrandId:"a","countSession":2,"countUser":1}
{"BrandId:"b","countSession":1,"countUser":1}
如果你知道SQL,预期结果与我提到的SQL相同。
您可以使用 $addToSet
to accumulate the distinct set of SessionId
and UserName
values during the $group
, and then adding a $project
stage to your pipeline that uses the $size
运算符来获取每个集合的大小:
db.logs.aggregate([
{$group: {
_id: '$BrandId',
sessionIds: {$addToSet: '$SessionId'},
userNames: {$addToSet: '$UserName'}
}},
{$project: {
_id: 0,
BrandId: '$_id',
countSession: {$size: '$sessionIds'},
countUser: {$size: '$userNames'}
}}
])
结果:
{
"BrandId" : "b",
"countSession" : 1,
"countUser" : 1
},
{
"BrandId" : "a",
"countSession" : 2,
"countUser" : 1
}
我有一个数据集看起来像
{"BrandId":"a","SessionId":100,"UserName":"tom"}
{"BrandId":"a","SessionId":200,"UserName":"tom"}
{"BrandId":"b","SessionId":300,"UserName":"mike"}
我想按 brandid 计算不同的会话和用户名组,示例 sql 如下:
select brandid,count_distinct(sessionid),count_distinct(username)
from data
group by brandid
我尝试编写 Mongo 数据库,我当前的代码如下,但它不起作用。有没有办法让它发挥作用?
db.logs.aggregate([
{$group:{
_id:{brand:"$BrandId",user:"$UserName",session:"$SessionId"},
count:{$sum:1}}},
{$group:{
_id:"$_id.brand",
users:{$sum:"$_id.user"},
sessions:{$sum:"$_id.session"}
}}
])
对于特定示例,预期计数为
{"BrandId:"a","countSession":2,"countUser":1}
{"BrandId:"b","countSession":1,"countUser":1}
如果你知道SQL,预期结果与我提到的SQL相同。
您可以使用 $addToSet
to accumulate the distinct set of SessionId
and UserName
values during the $group
, and then adding a $project
stage to your pipeline that uses the $size
运算符来获取每个集合的大小:
db.logs.aggregate([
{$group: {
_id: '$BrandId',
sessionIds: {$addToSet: '$SessionId'},
userNames: {$addToSet: '$UserName'}
}},
{$project: {
_id: 0,
BrandId: '$_id',
countSession: {$size: '$sessionIds'},
countUser: {$size: '$userNames'}
}}
])
结果:
{
"BrandId" : "b",
"countSession" : 1,
"countUser" : 1
},
{
"BrandId" : "a",
"countSession" : 2,
"countUser" : 1
}