Django - 注释按日期分组的不同值的 Count()
Django - Annotate Count() of distinct values grouped by Date
我有以下型号:
class Visualization(models.Model):
....
user: FK user
start_time: DATETIME
product: FK product
....
示例数据:
User ID
Start Time
Product ID
1
2021-09-07 14:03:07
3
2
2021-09-07 13:06:00
1
1
2021-09-07 17:03:06
1
4
2021-09-07 04:03:05
5
1
2021-09-07 15:03:17
4
1
2021-09-07 19:03:27
1
2
2021-09-06 21:03:31
3
1
2021-09-06 11:03:56
9
1
2021-09-06 07:03:19
9
我需要获取几天的活跃用户,活跃用户是指至少复制了一次的用户,如果一个用户复制了很多次,仍然算作1。
正确答案是:
Total
Date
3
2021-09-07
2
2021-09-06
首先,我做了一个 StartTime 的截断注释以仅保留日期,然后我为此注释创建了分组依据,到目前为止一切都没有问题。问题是当我尝试对用户进行计数时,因为他们有重复。我已经尝试用 Distinct = True 来计算 User_id,但数字仍然给我不好,而且相差很大。
我还尝试按 user_id 和句点(Truncate StartTime 的注释)分组,但它对我也不起作用。
1 天的真实数据示例
| User ID | Start Time | Product ID |
| :----: | :----------: | :-------: |
|5852|2021-09-07 11:33:48.000000 +00:00|0|
|5852|2021-09-07 11:33:38.000000 +00:00|2|
|6697|2021-09-07 11:31:55.000000 +00:00|3|
|6697|2021-09-07 11:31:31.000000 +00:00|1|
|6643|2021-09-07 11:28:29.000000 +00:00|1598|
|2703|2021-09-07 11:19:05.000000 +00:00|1620|
|6697|2021-09-07 11:18:40.000000 +00:00|3|
|6697|2021-09-07 11:17:32.000000 +00:00|1|
|28295|2021-09-07 11:11:34.000000 +00:00|1618|
|6697|2021-09-07 11:11:33.000000 +00:00|3|
|23968|2021-09-07 10:54:25.000000 +00:00|0|
|6697|2021-09-07 10:53:05.000000 +00:00|1|
|6697|2021-09-07 10:52:53.000000 +00:00|3|
|6697|2021-09-07 10:50:44.000000 +00:00|1|
|11|2021-09-07 10:48:06.000000 +00:00|1478|
|23968|2021-09-07 10:47:53.000000 +00:00|0|
|23968|2021-09-07 10:45:22.000000 +00:00|0|
|28283|2021-09-07 10:20:18.000000 +00:00|1191|
|23968|2021-09-07 10:19:58.000000 +00:00|2|
|23968|2021-09-07 10:19:37.000000 +00:00|0|
|23968|2021-09-07 10:19:20.000000 +00:00|2|
|11|2021-09-07 09:09:22.000000 +00:00|1436|
|359|2021-09-07 09:08:59.000000 +00:00|88|
|359|2021-09-07 09:07:32.000000 +00:00|100|
|28275|2021-09-07 08:59:39.000000 +00:00|2|
|28275|2021-09-07 08:50:31.000000 +00:00|2|
|23968|2021-09-07 08:46:10.000000 +00:00|1572|
|23968|2021-09-07 08:45:42.000000 +00:00|2|
|359|2021-09-07 08:41:48.000000 +00:00|1550|
|23968|2021-09-07 08:26:42.000000 +00:00|0|
|23968|2021-09-07 08:19:21.000000 +00:00|2|
|23968|2021-09-07 08:18:14.000000 +00:00|0|
|23968|2021-09-07 08:16:33.000000 +00:00|0|
|2703|2021-09-07 07:01:28.000000 +00:00|1620|
|2703|2021-09-07 06:59:43.000000 +00:00|1620|
|6697|2021-09-07 02:51:50.000000 +00:00|0|
|6697|2021-09-07 02:46:18.000000 +00:00|2|
|10452|2021-09-07 00:15:03.000000 +00:00|421|
|27953|2021-09-07 00:12:35.000000 +00:00|1|
returns 20 而不是 12。
您可以这样查询:
from django.db.models import Count
from django.db.models.functions import TruncDate
Visualization.objects.values(
<strong>date=TruncDate('start_time')</strong>
).annotate(
<strong>total=Count('user', distinct=True)</strong>
).order_by('date')
对于没有复制的日子,不会排在QuerySet
,所以你需要post-处理这些日期.
您可以使用 extra() 查询集修饰符 按日期查询分组:
from django.db.models import Count
Visualization.objects.extra(
select={'start_date': 'date( start_time )'}
).values(
'start_date'
).annotate(
total=Count('user', distinct=True)
)
我有以下型号:
class Visualization(models.Model):
....
user: FK user
start_time: DATETIME
product: FK product
....
示例数据:
User ID | Start Time | Product ID |
---|---|---|
1 | 2021-09-07 14:03:07 | 3 |
2 | 2021-09-07 13:06:00 | 1 |
1 | 2021-09-07 17:03:06 | 1 |
4 | 2021-09-07 04:03:05 | 5 |
1 | 2021-09-07 15:03:17 | 4 |
1 | 2021-09-07 19:03:27 | 1 |
2 | 2021-09-06 21:03:31 | 3 |
1 | 2021-09-06 11:03:56 | 9 |
1 | 2021-09-06 07:03:19 | 9 |
我需要获取几天的活跃用户,活跃用户是指至少复制了一次的用户,如果一个用户复制了很多次,仍然算作1。
正确答案是:
Total | Date |
---|---|
3 | 2021-09-07 |
2 | 2021-09-06 |
首先,我做了一个 StartTime 的截断注释以仅保留日期,然后我为此注释创建了分组依据,到目前为止一切都没有问题。问题是当我尝试对用户进行计数时,因为他们有重复。我已经尝试用 Distinct = True 来计算 User_id,但数字仍然给我不好,而且相差很大。 我还尝试按 user_id 和句点(Truncate StartTime 的注释)分组,但它对我也不起作用。
1 天的真实数据示例
| User ID | Start Time | Product ID |
| :----: | :----------: | :-------: |
|5852|2021-09-07 11:33:48.000000 +00:00|0|
|5852|2021-09-07 11:33:38.000000 +00:00|2|
|6697|2021-09-07 11:31:55.000000 +00:00|3|
|6697|2021-09-07 11:31:31.000000 +00:00|1|
|6643|2021-09-07 11:28:29.000000 +00:00|1598|
|2703|2021-09-07 11:19:05.000000 +00:00|1620|
|6697|2021-09-07 11:18:40.000000 +00:00|3|
|6697|2021-09-07 11:17:32.000000 +00:00|1|
|28295|2021-09-07 11:11:34.000000 +00:00|1618|
|6697|2021-09-07 11:11:33.000000 +00:00|3|
|23968|2021-09-07 10:54:25.000000 +00:00|0|
|6697|2021-09-07 10:53:05.000000 +00:00|1|
|6697|2021-09-07 10:52:53.000000 +00:00|3|
|6697|2021-09-07 10:50:44.000000 +00:00|1|
|11|2021-09-07 10:48:06.000000 +00:00|1478|
|23968|2021-09-07 10:47:53.000000 +00:00|0|
|23968|2021-09-07 10:45:22.000000 +00:00|0|
|28283|2021-09-07 10:20:18.000000 +00:00|1191|
|23968|2021-09-07 10:19:58.000000 +00:00|2|
|23968|2021-09-07 10:19:37.000000 +00:00|0|
|23968|2021-09-07 10:19:20.000000 +00:00|2|
|11|2021-09-07 09:09:22.000000 +00:00|1436|
|359|2021-09-07 09:08:59.000000 +00:00|88|
|359|2021-09-07 09:07:32.000000 +00:00|100|
|28275|2021-09-07 08:59:39.000000 +00:00|2|
|28275|2021-09-07 08:50:31.000000 +00:00|2|
|23968|2021-09-07 08:46:10.000000 +00:00|1572|
|23968|2021-09-07 08:45:42.000000 +00:00|2|
|359|2021-09-07 08:41:48.000000 +00:00|1550|
|23968|2021-09-07 08:26:42.000000 +00:00|0|
|23968|2021-09-07 08:19:21.000000 +00:00|2|
|23968|2021-09-07 08:18:14.000000 +00:00|0|
|23968|2021-09-07 08:16:33.000000 +00:00|0|
|2703|2021-09-07 07:01:28.000000 +00:00|1620|
|2703|2021-09-07 06:59:43.000000 +00:00|1620|
|6697|2021-09-07 02:51:50.000000 +00:00|0|
|6697|2021-09-07 02:46:18.000000 +00:00|2|
|10452|2021-09-07 00:15:03.000000 +00:00|421|
|27953|2021-09-07 00:12:35.000000 +00:00|1|
returns 20 而不是 12。
您可以这样查询:
from django.db.models import Count
from django.db.models.functions import TruncDate
Visualization.objects.values(
<strong>date=TruncDate('start_time')</strong>
).annotate(
<strong>total=Count('user', distinct=True)</strong>
).order_by('date')
对于没有复制的日子,不会排在QuerySet
,所以你需要post-处理这些日期.
您可以使用 extra() 查询集修饰符 按日期查询分组:
from django.db.models import Count
Visualization.objects.extra(
select={'start_date': 'date( start_time )'}
).values(
'start_date'
).annotate(
total=Count('user', distinct=True)
)