如何在不丢失加号 PHP 的情况下解析字符串？

Question

我正在解析 HTML 字符串以获取 PHP 中的值并将它们写入数据库。这是一个示例字符串：

<b>Adress:</b> 22 Examplary road, Nowhere <br>
<b>Phone:</b>  +371 12345678, +371 23456789<br>
<b>E-mail: </b>info@example.com<br>

字符串可以任意格式化。它可以包含我未解析的其他键，并且可以包含重复键。它也可以只包含一些我感兴趣的键或者完全是空的。 HTML 也可以被打破（示例标签：<br）。我决定我将遵循规则，即条目由 \n 分隔，并采用 key: value + 一些 HTML.

的形式

首先，我使用这段代码使字符串可解析：

$parse = strip_tags($string);
$parse = str_replace(':', '=', $parse);
$parse = str_replace("\n", '&', $parse);
$parse = str_replace("\r", '', $parse);
$parse = str_replace("\t", '', $parse);

我的字符串现在看起来像这样：

Adress= 22 Examplary road, Nowhere&Phone=  +123 12345678, +123 23456789&E-mail= info@example.com

然后我使用 parse_str() 获取值，如果找到所需的键，我取出值：

        parse_str($parse, $values);

        $address = null;
        if (isset($values['Adress']))
            $address = trim($values['Adress']);

        $phone = null;
        if (isset($values['Phone']))
            $phone = trim($values['Phone']);

问题是我最终得到 $phone = '371 12345678, 371 23456789' - 我丢失了 + 符号。如何保存这些？

此外，如果您有任何关于如何改进此程序的提示，我将很高兴知道。有些条目有 Website: example.com，其他条目有 Web Site example.com...我很确定不可能自动解析所有信息，但我正在寻找最佳解决方案。

解决方案

使用WEBjuju提供的技巧我现在用的是这个：

preg_match_all('/([^:]*):\s?(.*)\n/Usi', $string, $matches, PREG_SET_ORDER);

$values = [];
foreach ($matches as $match)
{
    $key = strip_tags($match[1]);
    $key = trim($key);
    $key = mb_strtolower($key);
    $key = str_replace("\s", '', $key);
    $key = str_replace('-', '', $key);

    $value = strip_tags($match[2]);
    $value = trim($value);

    $descriptionValues[$key] = $value;
}

这让我可以从这个输入开始：

<b>Venue:</b> The Hall<br
<b>Adress:</b> 22 Examplary road, Nowhere <br>
<b>Phone:</b>  +371 12345678<br>
<b>E-mail: </b>info@hkliepaja.lv<br>
<b>Website:</b> <a href="http://example.com/" target="_blank">example.com</a><br>

一个漂亮的 PHP 数组，具有均匀且希望可识别的键：

[
    'venue' => 'The Hall',
    'adress' => '22 Examplary road, Nowhere',
    'phone' => '+371 12345678',
    'email' => 'info@example.com',
    'website' => 'example.com',
];

它仍然没有说明缺少冒号的情况，但我认为我无法解决这个问题...

Answer 1

使用base64_encode() before you put your value in your string. In the code where you receive this string, use base64_decode()取回。

page1.php

$string = '&Adress='.base64_encode('22 Examplary road, Nowhere').'&Phone='.base64_encode('+123 12345678, +123 23456789').'&Email='.base64_encode('info@example.com');
// string is sent via curl or some other transport to page2.php

page2.php

parse_str($string);
echo base64_decode($Adress); // 22 Examplary road, Nowhere
echo base64_decode($Phone); // +123 12345678, +123 23456789
echo base64_decode($Email); // info@example.com

Answer 2

意识到你已经预先HTML符合一个简单的标准结构我可以告诉你正则表达式匹配将是获取这些数据的最佳方式。这是一个让你上路的例子 - 我确信它不能解决所有问题，但它解决了你在这个 post 上遇到的问题，你在 "finding key/var matches".

// now go get those matches!
preg_match_all('/<b>([^:]*):\s?<\/b>(.*)<br>/Usi', $string, $matches, PREG_SET_ORDER);
die('<pre>'.print_r($matches,true));

这将输出，例如，像这样的东西：

Array
(
  [0] => Array
    (
        [0] => <b>Adress:</b> 22 Examplary road, Nowhere <br>
        [1] => Adress
        [2] =>  22 Examplary road, Nowhere
    )

  [1] => Array
    (
        [0] => <b>Phone:</b>  +371 12345678, +371 23456789<br>
        [1] => Phone
        [2] =>   +371 12345678, +371 23456789
    )

  [2] => Array
    (
        [0] => <b>E-mail: </b>info@example.com<br>
        [1] => E-mail
        [2] => info@example.com
    )

从那里，我猜你可以把它推入标准杆。

如何在不丢失加号 PHP 的情况下解析字符串？

How to parse a string without losing plus sign in PHP?

php

parsing

html-parsing

解决方案