PHP

Question

我正在尝试清理一个字符串，结果如下：

Characterisation of the arsenic resistance genes in lt i gt Bacillus lt i gt sp UWC isolated from maturing fly ash acid mine drainage neutralised solids

我正在尝试删除 lt、i、gt，因为它们已减少 HTML 实体，但似乎没有被删除。处理这个或我可以查看的另一个解决方案的最佳方法是什么？

这是我目前的解决方案：

/**
 * @return string
 */
public function getFormattedTitle()
{
    $string = preg_replace('/[^A-Za-z0-9\-]/', ' ',  filter_var($this->getTitle(), FILTER_SANITIZE_STRING));
    return $string;
}

这是一个示例输入字符串：

Assessing <i>Clivia</i> taxonomy using the core DNA barcode regions, <i>matK</i> and <i>rbcLa</i>

谢谢！

Answer 1

而不是 filter_var，试试 strip_tags：http://php.net/manual/en/function.strip-tags.php

<?php
  //your input string
  $input_string = 'Assessing <i>Clivia</i> taxonomy using the core DNA barcode regions, <i>matK</i> and <i>rbcLa</i>';

  //strip away all html tags but leave whats inside
  $output_string = strip_tags($input_string);

  echo $output_string;
  //echos: Assessing Clivia taxonomy using the core DNA barcode regions, matK and rbcLa 

?>

Answer 2

更好的方法是 strip_tags(); 请在此处查看手册： http://php.net/manual/ru/function.strip-tags.php 一个例子：

   public function getFormattedTitle()
    {
        return strip_tags($this->getTitle(), '<i>');
    }

Answer 3

输出中的 lt 和 gt 告诉我你的字符串实际上更像是：

"Assessing Clivia taxonomy using the core DNA barcode regions, matK and rbcLa"

当以纯文本形式查看时。

您在上面显示的字符串是将在浏览器中显示的字符串，该浏览器会解释“<”作为“<”和“>”作为“>”。（这些通常称为 "HTML entities" 并提供一种编码字符的方法，否则将被解释为 HTML。）

一种选择是这样处理：

$s = "Assessing &lt;i&gt;Clivia&lt;/i&gt; taxonomy …";
$s = html_entity_decode($s); // $s is now "Assessing <i>Clivia</i> taxonomy …"
$s = strip_tags($s); // $s is now "Assessing Clivia taxonomy"

但请注意 strip_tags 是一个非常幼稚的函数。例如，它会将“1<5 and 6>2”变成“12”！因此，您需要确保所有输入文本都是双 HTML 编码的，因为示例是为了让它完美地工作。

PHP - 从字符串中删除已解码的 HTML 个实体

PHP - Remove decoded HTML entities from string

string

replace

html-entities