循环嵌套数组时冲突的基准测试

Question

信息

我正在尝试创建一种有效的方法来循环使用嵌套 objects/arrays 的数组，因为循环数据的函数将运行频繁并根据它们是否匹配另一个中的记录来过滤值包含对象的数组。

我正在处理的数据有以下类型。

type WatchedShows = {
    showId: number;
    season: number;
    episode: number;
}[];

type Shows = {
    id: number;
    seasons: {
        season: {
            season_number: number;
            episodes: {
                episode_number: number;
            }[];
        };
    }[];
}[];

WatchedShows中的数据在我自己的数据库中，因此我可以控制如何将其发送到前端，但是Shows中的数据来自外部API。

我想做的是过滤 Shows 中与 WatchedShows 中的数据匹配的所有剧集，如果所有内容都被标记为已观看，还过滤季节和整个节目。

问题

目前我有 3 个不同的解决方案，但第一个是 3 个嵌套循环，并且随着一些数据的使用很快变慢，所以实际上我有 2 个解决方案。我还需要return格式为Shows

的数据

我现在已经尝试运行它们通过基准测试，并使用计时器进行了一些测试，并检查它们每个有多少次迭代。从我自己做的测试结果来看，有一个明显的赢家，因为一个运行时间为 ~5ms and 42637 iterations 而另一个运行时间为 ~15ms and 884052 iterations。

然后我尝试通过 JSBench.me 运行它，当我这样做时，相反的解决方案作为获胜者出现，网站称其速度提高了 90%。当该解决方案具有 884052 次迭代和其他 42637 次迭代时，怎么会发生这种情况？是我的解决方案优化不佳还是我还遗漏了什么？将不胜感激有关改进解决方案的提示。

这两个测试都是使用相同的代码生成的数据集完成的，该数据集包含分布在 26 个节目中的大约 20 000 集记录，每季 20 季，每季 40 集。如果需要，可以在基准站点上看到生成数据集的代码。可以看到benchmarkhere

代码

具有 884052 次迭代和运行 ~15 毫秒时间的解决方案如下所示

const newShowObject = {};

for (const show of shows) {
    newShowObject[show.id] = { ...show };
}

for (const show of watchedShows) {
    for (const season of newShowObject[show.showId].seasons) {
        if (season.season_number !== show.season) {
            continue;
        }
        season.episodes = season.episodes.filter(
            (episode) => episode.episode !== show.episode
        );
    }

    newShowObject[show.showId].seasons = newShowObject[show.showId].seasons.filter(
        (season) => season.episodes.length > 0
    );
}

const unwatchedShows = [...Object.values(newShowObject)].filter(
    (show) => show.seasons.length > 0
);

具有 42637 次迭代和运行约 5 毫秒时间的解决方案如下所示

const newShowObject = {};

for (const show of shows) {
    newShowObject[show.id] = { ...show };

    const newSeasons = {};

    for (const seasons of show.seasons) {
        newSeasons[seasons.season_number] = { ...seasons };

        const newEpisodes = {};

        for (const episodes of seasons.episodes) {
            newEpisodes[episodes.episode] = { ...episodes };
        }

        newSeasons[seasons.season_number].episodes = { ...newEpisodes };
    }
    newShowObject[show.id].seasons = { ...newSeasons };
}

for (const show of watchedShows) {
    delete newShowObject[show.showId].seasons[show.season].episodes[show.episode];
}

let unwatchedShows = [...Object.values(newShowObject)];

for (const show of unwatchedShows) {
    show.seasons = [...Object.values(show.seasons)];
    for (const season of show.seasons) {
        season.episodes = [...Object.values(season.episodes)];
    }
    show.seasons = show.seasons.filter((season) => season.episodes.length > 0);
}

unwatchedShows = unwatchedShows.filter((show) => show.seasons.length > 0);

Answer 1

我不确定您如何使用第二种解决方案更快地获得结果，我会三次检查结果。 JSBench 结果是合法的，并且符合 Big-O 表示法。

第一个解决方案循环遍历 shows 以创建查找 table。然后，循环通过具有四倍复杂度 O(n^2).

的嵌套循环

第二个解决方案通过三次复杂度O(n^3)的嵌套循环循环，因为它嵌套了三次。所以，我希望这个算法能消耗更多时间。

其原因称为总和规则。当两个循环位于 side-by-side 时，它们不会相乘而是相加。这使得复杂度为 O(n + n^2)，可以进一步减少到 O(n^2)，因为随着 n 接近无穷大，第一个循环变得可以忽略不计。

Answer 2

问题是您的基准测试有缺陷。请注意，jsbench 运行每个测试用例只执行一次“设置代码”，而不是每个测试循环迭代。

你的第一个解决方案确实改变了输入（特别是 season.episodes - 虽然它确实克隆了每个节目），所以只有在第一个运行它实际上给出正确的输出。在测试循环的所有后续运行秒中，它基本上是运行空输入，这要快得多。

我修复了这个 here，现在第二个解决方案是预期的最快的。（不过请注意，时间现在还包括创建对象）。

循环嵌套数组时冲突的基准测试

Conflicting benchmark tests while looping over nested arrays

javascript