如何检查另一个字符串前面的字符串是否存在，并从 PHP 中的 HTML 数据中获取文件名？

Question

我有一个名为 $comments 的数组关联数组，如下所示：

    Array
    (
    [0] => Array
            (
                [text] => Uploading Photo  for comment <div class="comment_attach_image">

    <a title="colorbox" href="https://www.filepicker.io/api/file/CnYTVQdATAOQTkMxpAq4" ><img src="https://www.filepicker.io/api/file/CnYTVQdATAOQTkMxpAq4" height="150px" width="150px" /></a>

    <a href="https://www.filepicker.io/api/file/CnYTVQdATAOQTkMxpAq4" class="comment_attach_image_link_dwl">Download</a>

    </div>                
            )
    [1] => Array
            (
                [text] => <div class="comment_attach_file">

    <a class="comment_attach_file_link" href="https://www.filepicker.io/api/file/pxRBwNBcSueP0hf1meOI" >pxRBwNBcSueP0hf1meOI</a>

    <a class="comment_attach_file_link_dwl"  href="https://www.filepicker.io/api/file/pxRBwNBcSueP0hf1meOI" >Download</a>
    </div>
            )
     [2] => Array
            (  
                [text] => This comment is of two lines need to check more about it                
            )
    )

我想对上述数组中的每个元素执行两次操作。

第一件事 我想检查 <div> 标签之前是否存在任何字符串（文本）。与在第一个元素中一样，文本 'Uploading Photo for comment' 出现在 <div> 标记之前，而在第二个元素中则没有。如果存在这样的文本，请将其分配给某个变量，并且应将其存储在数组 $comments 的键 [text] 中，否则数组 $comments 的键 [text] 应包含 null.

第二件事 我想要的是从 HTML 数据中获取文件名。对于第一个数组元素，它应该是 CnYTVQdATAOQTkMxpAq4，对于第二个元素，它应该是 pxRBwNBcSueP0hf1meOI。此文件名应存储在数组 $comments 的键 [file_name] 中，否则数组 $comments 的键 [file_name] 应包含空值。

如果数组元素类似于第三个数组元素，即不包含任何 HTML 只是简单的文本，则不会发生任何事情。

我应该如何有效地实现这两件事？实际数组可能包含数百个这样的元素。

我尝试了以下代码，但它没有返回文件名，当前面的字符串丢失时 returns 我把垃圾 HTML 变成关键文本。为了您的参考，我把我的代码放在下面。

 foreach($comments as $key=>$comment) {
    $text = strstr($comment['text'], '<div');
    if (strlen($text) <= 0) {
      $comments[$key]['type_id'] =  'text';
      $comments[$key]['url'] =  '';
      $comments[$key]['text'] =  $comment['text'];
    } else if($xml = @simplexml_load_string($text)) { 
      $comments[$key]['type_id'] =  substr(strrchr($xml['class'], '_'), 1);
      $comments[$key]['url'] = str_replace(array('href=','"'), '',$xml->a['href']->asXML());
      $comments[$key]['text'] =  strtok($comment['text'], '<');           
    } else {
      continue;
    }    
  }

P.S。 : 请查看元素 HTML 中的细微差别。请考虑这些差异，以便执行我提到的两个操作。

谢谢。

Answer 1

就像我在评论中所说的那样，当使用 HTML 解析器时，这个任务（HTML 解析）更容易管理。不要为此使用字符串函数。这就是它们的设计目的。

本例中使用DOMDocumentclass：

$comments = array(
    array('text' => 'Uploading Photo  for comment <div class="comment_attach_image">

        <a title="colorbox" href="https://www.filepicker.io/api/file/CnYTVQdATAOQTkMxpAq4" ><img src="https://www.filepicker.io/api/file/CnYTVQdATAOQTkMxpAq4" height="150px" width="150px" /></a>

        <a href="https://www.filepicker.io/api/file/CnYTVQdATAOQTkMxpAq4" class="comment_attach_image_link_dwl">Download</a>

        </div>'
    ),
    array('text' => '<div class="comment_attach_file">

        <a class="comment_attach_file_link" href="https://www.filepicker.io/api/file/pxRBwNBcSueP0hf1meOI" >pxRBwNBcSueP0hf1meOI</a>

        <a class="comment_attach_file_link_dwl"  href="https://www.filepicker.io/api/file/pxRBwNBcSueP0hf1meOI" >Download</a>
        </div>'
    ),
);

$data = array();

foreach($comments as $c) {
    $temp = array();
    $dom = new DOMDocument;
    @$dom->loadHTML(trim($c['text']));
    $first = $dom->getElementsByTagName('body')->item(0)->firstChild;


    $file = $first->parentNode->getElementsByTagName('a')->item(0);
    $url = $file->getAttribute('href');


    if($first->tagName != 'div') {
        // not div
        $not_div = $first->parentNode->getElementsByTagName('div')->item(0);
        $type = explode('_', $not_div->getAttribute('class'));
        $type = end($type);
        $temp['type_id'] = $type;
        $temp['url'] = $url;
        $temp['file_name'] = basename($url);
        $temp['text'] = $c['text'];

    } else {
        // div
        $type = explode('_', $first->getAttribute('class'));
        $type = end($type);
        $temp['type_id'] = $type;
        $temp['url'] = $url;
        $temp['file_name'] = basename($url);
        $temp['text'] = '';
    }

    $data[] = $temp;
}

echo '<pre>', print_r($data, 1), '</pre>';

如何检查另一个字符串前面的字符串是否存在，并从 PHP 中的 HTML 数据中获取文件名？

How to check the presence of preceding string to another string and get file name from the HTML data in PHP?

html

php

arrays

string

associative-array