在使用 200 万行以上的数据库时,有什么方法可以优化查询 table

Is there any way to optimize the query while working with 2mil+ rows database table

我正在处理 Laravel 查询,该查询应计算过去 3 个月的最新月份数据并按周分组。我已尝试通过多种方式解决该问题,但它仍然超出了我的内存限制并且加载速度非常慢。 下面是我用来获得最终结果的当前代码 - 但这里也存在同样的问题。

知道如何优化数据的计数和分组吗?

$data['pois']['total'] = PoiLocation::whereYear('created_at', Carbon::now()->year)
    ->whereMonth('created_at', Carbon::now()->month)
    ->count();

$pois = PoiLocation::where('created_at', '>', (new Carbon)->subMonths(3))
    ->get()
    ->sortBy('created_at')
    ->groupBy(function ($collection) {
        return Carbon::parse($collection->created_at)->isoWeek();
    });

if ($pois->count()) {
    foreach ($pois as $item => $value) {
        $data['pois']['weeks'][$item] = $value->count();
    }
} else {
    $data['pois']['weeks'] = [];
}

if ($data['pois']['weeks']) {
    $data['pois']['high'] = max($data['pois']['weeks']);
} else {
    $data['pots']['high'] = 1;
}
protected $fillable = [
    'store_id', 'name', 'address1', 'address2', 'city','state','zip_code', 'dma_desc', 'country', 'lat' ,'lon', 'target', 'is_verified', 'polygons', 'external_id', 'brandID', 'companyID'
];

protected $dates = ['created_at', 'updated_at'];

public $timestamps = true;

我认为你的主要问题在这里:

$pois = PoiLocation::where('created_at', '>', (new Carbon)->subMonths(3))->get()->sortBy('created_at')->groupBy(function ($collection) {
return Carbon::parse($collection->created_at)->isoWeek();
});

您将获取过去三个月内创建的每条记录,将它们加载到您的内存中,然后进行排序和分组。您应该在获取记录之前在数据库中进行排序和分组操作:

$pois = PoiLocation::where('created_at', '>', (new Carbon)->subMonths(3))->orderBy('created_at')->get();

这在数据库中按 created_at 排序。您的小组需要多考虑一下...事实证明,在数据库查询中对整个结果集进行分组有点棘手。

如果你只是想获得每周的计数,你可以使用类似的东西:

PoiLocation::select(DB::raw("count(*), WEEK(created_at) as week, YEAR(created_at) as year"))->groupBy(['week', 'year'])->get()

我认为正如@IGP 所建议的那样,您的目标可能不是真正获取整组数据,而只是获取指标。在这种情况下,在一年中按周推送一些操作(如上述计数)将有助于获取所需的数据,而无需遍历内存中的每条记录。

所以满足你原来的要求:

最近三个月创建的项目总数(您的原始查询没问题):

$totalPastThreeMonths = PoiLocation::whereYear('created_at', Carbon::now()->year)
->whereMonth('created_at', Carbon::now()->month)
->count();

最近三个月每周的项目数:

$itemsPerWeek = PoiLocation::select(DB::raw("count(*) as count, WEEK(created_at) as week"))->whereYear('created_at', Carbon::now()->year)
->whereMonth('created_at', Carbon::now()->month)->groupBy('week')->get()

物品数量最多的一周:

$itemsPerWeek->sortBy('count')->last();

或者:

$itemsPerWeek->max('count');

我想这些只会让你到达你想去的地方。

您可以使用 LazyCollections。这应该会大大减少您的内存使用量。

$pois = PoiLocation::query()
    ->where('created_at', '>', Carbon::now()->subMonths(3))
    ->orderBy('created_at') // sort in DB instead of wasting more memory doing the sorting.
    ->cursor() // don't load every model in memory
    ->remember() // don't repeat the query (if this line wasn't here, the query would be made 3 times. 1- $poi->all(), 2- ($poi->max() !== null) 3- $poi->max() )
    ->groupBy(function (Poilocation $poiLocation) { // param here is not a collection, it's a Poilocation. Type hint is completely optional
        return $poiLocation->created_at->isoWeek(); // $poiLocation->created_at should already be a Carbon instance because of Eloquent magic.
    })
    ->map->count();

$data['pois']['weeks'] = $poi->all();
$data['pois']['high'] = ($poi->max() !== null) ? $poi->max() : 1;

这也简化了你的逻辑。

  • $poi->all() 将 return 一个带有键的数组,无论是否为空。
  • $poi->max() 将 return collection 的 max()。如果 collection 为空,它将 return null。一个简单的三元运算符也可以处理您的那部分逻辑。