用于计算反向关系中使用的所有实体的 Django 注释查询
Django Annotated Query to Count all entities used in a Reverse Relationship
这个问题是这个 SO 问题的后续问题:
鉴于这些模型:
class Candidate(BaseModel):
name = models.CharField(max_length=128)
class Status(BaseModel):
name = models.CharField(max_length=128)
class StatusChange(BaseModel):
candidate = models.ForeignKey("Candidate", related_name="status_changes")
status = models.ForeignKey("Status", related_name="status_changes")
created_at = models.DateTimeField(auto_now_add=True, blank=True)
由这些表表示:
candidates
+----+--------------+
| id | name |
+----+--------------+
| 1 | Beth |
| 2 | Mark |
| 3 | Mike |
| 4 | Ryan |
+----+--------------+
status
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
+----+--------------+
status_change
+----+--------------+-----------+------------+
| id | candidate_id | status_id | created_at |
+----+--------------+-----------+------------+
| 1 | 1 | 1 | 03-01-2019 |
| 2 | 1 | 2 | 05-01-2019 |
| 4 | 2 | 1 | 01-01-2019 |
| 5 | 3 | 1 | 01-01-2019 |
| 6 | 4 | 3 | 01-01-2019 |
+----+--------------+-----------+------------+
我想统计每个状态类型,但只包括每个候选人的最后一个状态:
last_status_count
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 |
| 2 | Accepted | 1 |
| 3 | Rejected | 1 |
+-----------+-------------+--------+
我用 实现了这一点:
from django.db.models import Count, F, Max
Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1)]
但是,如果有任何状态更改都未引用的状态,则结果中会忽略该状态。相反,我想把它算作零。
例如,如果状态为
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
| 4 | Banned |
+----+--------------+
我会得到:
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 |
| 2 | Accepted | 1 |
| 3 | Rejected | 1 |
| 4 | Banned | 0 |
+-----------+-------------+--------+
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
我试过的
我通过在 SQL 中进行外部连接解决了这个问题,但我不确定如何在 Djano 中实现它。
我尝试创建一个所有计数都注释为零的查询集并将其合并,但它不起作用:
last_status_changes = Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
zero_query = (
Status.objects.all()
.annotate(nlast=Value(0, output_field=IntegerField()))
.exclude(pk__in=last_status_changes.values("id"))
)
>>> qs = last_status_changes | zero_query
>>> [(q.name, q.nlast) for q in qs]
[('Review', 3), ('Accepted', 1), ('Rejected', 1)]
# this would double count "Review" and include not only last but others
感谢任何帮助
谢谢
更新 1
我能够使用正确的连接通过原始查询解决这个问题,但是如果使用 ORM
# Untested as I am using different model names in reality
SQL = """SELECT
Min(status.id) as id
, COUNT(latest_status_change.candidate_id) as status_count
FROM
(
SELECT
candidate_id,
Max(created_at) AS latest_date
FROM
api_status_change
GROUP BY candidate_id
)
AS latest_status_change
INNER JOIN api_candidates ON (latest_status_change.candidate_id = api_candidates.id)
INNER JOIN api_status_change ON
(
latest_status_change.candidate_id = api_candidates.id
AND
latest_status_change.latest_date = api_status_change.created_at
)
RIGHT JOIN api_status AS status ON (api_status_change.status_id = `status`.id)
GROUP BY status.name
;
"""
qs = Status.objects.raw(SQL)
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
这里唯一的问题是您正在根据现有状态变化过滤 State
查询集,并期望得到完全相反的结果。在您的情况下,解决方案是摆脱过时的过滤
last_status_changes = Status.objects.annotate(
nlast=Count('status_changes')
).order_by(
'-nlast'
)
另一种情况是,如果您真的想过滤更改(例如按日期)
changed_status_ids = Status.objects.filter(
status_changes__created_at__gte='2020-03-03'
).values_list(
'id',
flat=True
)
Status.objects.annotate(
c=Count('status_changes')
).annotate(
cnt=Case(
When(
id__in=changed_status_ids,
then=F('c')
),
output_field=models.IntegerField(),
default=0
)
).values(
'cnt',
'name'
).order_by(
'-cnt'
)
我用下面的查询集解决了它:
qs_last_status_changes = StatusChanges.objects
.annotate(
_last_change=models.Max("candidate__status_changes__create_at")
).filter(created_at=models.F("_last_change")
qs_status = Status.objects\
.annotate(count=models.Sum(
models.Case(
models.When(
status_changes__in=qs_last_status_changes,
then=models.Value(1)
),
output_field=models.IntegerField(),
default=0,
)
)
)
>>> [(k.name, k.count) for k in qs_status]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
谢谢 Andrey Nelubin 的建议
这个问题是这个 SO 问题的后续问题:
鉴于这些模型:
class Candidate(BaseModel):
name = models.CharField(max_length=128)
class Status(BaseModel):
name = models.CharField(max_length=128)
class StatusChange(BaseModel):
candidate = models.ForeignKey("Candidate", related_name="status_changes")
status = models.ForeignKey("Status", related_name="status_changes")
created_at = models.DateTimeField(auto_now_add=True, blank=True)
由这些表表示:
candidates
+----+--------------+
| id | name |
+----+--------------+
| 1 | Beth |
| 2 | Mark |
| 3 | Mike |
| 4 | Ryan |
+----+--------------+
status
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
+----+--------------+
status_change
+----+--------------+-----------+------------+
| id | candidate_id | status_id | created_at |
+----+--------------+-----------+------------+
| 1 | 1 | 1 | 03-01-2019 |
| 2 | 1 | 2 | 05-01-2019 |
| 4 | 2 | 1 | 01-01-2019 |
| 5 | 3 | 1 | 01-01-2019 |
| 6 | 4 | 3 | 01-01-2019 |
+----+--------------+-----------+------------+
我想统计每个状态类型,但只包括每个候选人的最后一个状态:
last_status_count
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 |
| 2 | Accepted | 1 |
| 3 | Rejected | 1 |
+-----------+-------------+--------+
我用
from django.db.models import Count, F, Max
Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1)]
但是,如果有任何状态更改都未引用的状态,则结果中会忽略该状态。相反,我想把它算作零。 例如,如果状态为
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
| 4 | Banned |
+----+--------------+
我会得到:
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 |
| 2 | Accepted | 1 |
| 3 | Rejected | 1 |
| 4 | Banned | 0 |
+-----------+-------------+--------+
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
我试过的
我通过在 SQL 中进行外部连接解决了这个问题,但我不确定如何在 Djano 中实现它。 我尝试创建一个所有计数都注释为零的查询集并将其合并,但它不起作用:
last_status_changes = Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
zero_query = (
Status.objects.all()
.annotate(nlast=Value(0, output_field=IntegerField()))
.exclude(pk__in=last_status_changes.values("id"))
)
>>> qs = last_status_changes | zero_query
>>> [(q.name, q.nlast) for q in qs]
[('Review', 3), ('Accepted', 1), ('Rejected', 1)]
# this would double count "Review" and include not only last but others
感谢任何帮助 谢谢
更新 1
我能够使用正确的连接通过原始查询解决这个问题,但是如果使用 ORM
# Untested as I am using different model names in reality
SQL = """SELECT
Min(status.id) as id
, COUNT(latest_status_change.candidate_id) as status_count
FROM
(
SELECT
candidate_id,
Max(created_at) AS latest_date
FROM
api_status_change
GROUP BY candidate_id
)
AS latest_status_change
INNER JOIN api_candidates ON (latest_status_change.candidate_id = api_candidates.id)
INNER JOIN api_status_change ON
(
latest_status_change.candidate_id = api_candidates.id
AND
latest_status_change.latest_date = api_status_change.created_at
)
RIGHT JOIN api_status AS status ON (api_status_change.status_id = `status`.id)
GROUP BY status.name
;
"""
qs = Status.objects.raw(SQL)
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
这里唯一的问题是您正在根据现有状态变化过滤 State
查询集,并期望得到完全相反的结果。在您的情况下,解决方案是摆脱过时的过滤
last_status_changes = Status.objects.annotate(
nlast=Count('status_changes')
).order_by(
'-nlast'
)
另一种情况是,如果您真的想过滤更改(例如按日期)
changed_status_ids = Status.objects.filter(
status_changes__created_at__gte='2020-03-03'
).values_list(
'id',
flat=True
)
Status.objects.annotate(
c=Count('status_changes')
).annotate(
cnt=Case(
When(
id__in=changed_status_ids,
then=F('c')
),
output_field=models.IntegerField(),
default=0
)
).values(
'cnt',
'name'
).order_by(
'-cnt'
)
我用下面的查询集解决了它:
qs_last_status_changes = StatusChanges.objects
.annotate(
_last_change=models.Max("candidate__status_changes__create_at")
).filter(created_at=models.F("_last_change")
qs_status = Status.objects\
.annotate(count=models.Sum(
models.Case(
models.When(
status_changes__in=qs_last_status_changes,
then=models.Value(1)
),
output_field=models.IntegerField(),
default=0,
)
)
)
>>> [(k.name, k.count) for k in qs_status]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
谢谢 Andrey Nelubin 的建议