在 Django 中使用相关对象批量创建对象的有效方法是什么？

Question

我有以下型号：

class LocationPoint(models.Model):
    latitude = models.DecimalField(max_digits=16, decimal_places=12)
    longitude = models.DecimalField(max_digits=16, decimal_places=12)

    class Meta:
        unique_together = (
            ('latitude', 'longitude',),
        )

class GeoLogEntry(models.Model):
    device = models.ForeignKey(Device, on_delete=models.PROTECT)
    location_point = models.ForeignKey(LocationPoint, on_delete=models.PROTECT)
    recorded_at = models.DateTimeField(db_index=True)
    created_at = models.DateTimeField(auto_now_add=True, db_index=True)

我有很多传入记录要创建（一次可能有数千条）。

目前我是这样创建它们的：

# Simplified map function contents (removed mapping from dict as it's unrelated to the question topic
points_models = map(lambda point: LocationPoint(latitude=latitude, longitude=longitude), points)

LocationPoint.objects.bulk_create(
     points_models,
     ignore_conflicts=True
)

# Simplified map function contents (removed mapping from dict as it's unrelated to the question topic
geo_log_entries = map(
            lambda log_entry: GeoLogEntry(device=device, location_point=LocationPoint.objects.get(latitude=latitude, longitude=longitude), recorded_at=log_entry.recorded_at),
            log_entries
        )

GeoLogEntry.objects.bulk_create(geo_log_entries, ignore_conflicts=True)

但我认为它不是很有效，因为它运行 N SELECT 查询 N 记录。有更好的方法吗？

我使用 Python 3.9、Django 3.1.2 和 PostgreSQL 12.4。

Answer 1

bulk_create(...) 将 return 您创建的对象作为列表。您可以在 Python 端过滤这些对象，而不是查询您的数据库，因为它们已经被提取。

location_points = LocationPoint.objects.bulk_create(
     points_models,
     ignore_conflicts=True
)

geo_log_entries = map(
    lambda log_entry: GeoLogEntry(
        device=device, 
        location_point=get_location_point(log_entry, location_points),      
        recorded_at=log_entry.recorded_at
    ),
    log_entries
)

GeoLogEntry.objects.bulk_create(geo_log_entries, ignore_conflicts=True)

您需要做的就是实现get_location_point满足您的需求

Answer 2

主要问题是批量获取 link 的对象。一旦我们存储了所有这些对象，我们就可以批量获取对象：

from django.db.models import Q

points_models = [
    LocationPoint(latitude=point.latitude, longitude=point.longitude)
    for point in points
]

LocationPoint.objects.bulk_create(
     points_models,
     ignore_conflicts=True
)

<b>qfilter = Q(</b>
    *[
          Q(('latitude', point.latitude), ('longitude', point.longitude))
          for point in log_entries
    ],
    _connector=Q.OR
<b>)</b>


data = {
    <b>(lp.longitude, lp.latitude): lp.pk</b>
    for lp in LocationPoint.objects.filter(qfilter)
}

geo_log_entries = [
    GeoLogEntry(
        device=entry.device,
        <b>location_point_id=data[entry.longitude, entry.latitude]</b>,
        recorded_at=entry.recorded_at
    )
    for entry in log_entries
]

GeoLogEntry.objects.bulk_create(geo_log_entries, ignore_conflicts=True)

因此，我们批量获取我们需要 link 的所有对象（因此使用一个查询），创建一个将经度和纬度映射到主键上的字典，然后设置 location_point_id 至此。

然而，重要的是使用小数，或者至少使用匹配的类型。浮点数 tricky，因为它们很容易出现舍入误差（因此经度和纬度通常存储为“固定点”数字，例如大 1'000 或大 1'000'000 的整数）。否则，您应该使用将其与通过查询生成的数据相匹配的算法。

在 Django 中使用相关对象批量创建对象的有效方法是什么？

What is an effective way to bulk create objects with related objects in django?

python

django

postgresql

django-models

django-queryset