为什么将 isset 应用于多维数组会使执行时间增加 4000%？

Question

编辑： 事实证明，$article->getID(); 是导致超标执行时间的部分。这是它的样子：

public function getId()
{
    return $this->id;
}

但我仍然不知道为什么会这样。

我正在使用此代码尝试为包含 1000 个文档和 4000 个唯一个标记的小型语料库中的每个标记计算 document frequency。

为此，我编写了这个函数：

public function computeIDF(){
    // splitting documents into tokens
    $this->tokens = $this->tokenize();
    // $this->tokens = array($article->id => array($token => $freq))

    // 1. For each token …
    foreach($this->tokens as $token){

        // 2. … look in every document …
        foreach($this->articles as $article){

            // 3. … and if it exists there …
            if(isset($this->tokens[$article->getID()][$token]){

                // 4. … add 1
                $tokFreq[$token] += 1;
            }
        }
    }
}

但是第 3 步引起了很多麻烦:
- 如果我注释掉步骤 4.，没有任何变化；
- 如果我注释掉步骤 3.，执行时间从 414.2s 变为 "just" 14s，几乎快了 4000% ！！1！（所以，这绝对不是 "micro-optimization" 问题。

注意这里不涉及数据库。一切都早早地从整个 class' 范围中获取：

// This is where the data is being fetched
$articles = ArticleDAO::loadLast(1000);

// It's then injected into the $corpus
$corpus = new Corpus($articles);

我是不是做错了什么？如果是这样，我怎样才能使事情变得更快？

Answer 1

每次调用 $article->getID() 都有开销。它必须保存堆栈，调用函数，复制结果，然后恢复。因为您要为每篇文章迭代每个标记（而不是相反），所以每次调用 getID 都将针对不同的文章，因此不能走捷径。

您可以做两件事：

仅将 $article->getID() 替换为 $article->id
让你的外循环跨文章，让你的内循环跨令牌。这样你就可以在一篇文章中找到整批令牌，这应该有助于缓存。

为什么将 isset 应用于多维数组会使执行时间增加 4000%？

Why applying isset to a multidimensional array increased execution time by 4000%?

php

arrays

performance

isset

multidimensional-array