PHP IMAP 如何只获取正文的文本部分？不是不同的 <html> 标签等

Question

我正在尝试编写一个脚本，从交换服务器下载电子邮件，然后将其插入到数据库中，但我无法很好地获取电子邮件的 'text part'。

phpcode

<?PHP
$user = "email@domain.com";
$password = "password123";
$mbox = imap_open("{exchange01:993/imap/ssl/novalidate-cert}", $user, $password);

$message = imap_fetchbody($mbox,1,1);

print_r($message);

if($mbox)
{
    imap_close($mbox);
};
?>

并且整个 html 正文都被打印出来了。我想这是意料之中的事，但我不想

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=iso-8859-1"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
    {font-family:Calibri;
    panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
    {font-family:Verdana;
    panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
    {font-family:"Neo Sans Std";}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
    {margin:0cm;
    margin-bottom:.0001pt;
    font-size:11.0pt;
    font-family:"Calibri",sans-serif;
    mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
    {mso-style-priority:99;
    color:#0563C1;
    text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
    {mso-style-priority:99;
    color:#954F72;
    text-decoration:underline;}
span.E-postmall17

..mumbojumbo，只是电子邮件本身的文本（我可以忍受有签名和图像等等）。

有没有比在 <body... 到 </body... 处粗略地切割长字符串然后从那里进一步切割更简单的方法？一定有其他人想解决同样的问题，但我花了一整天的时间试图解决它并 google:ing 后找不到任何答案。

我想最后我会把整个 html 响应插入数据库单元格并希望最好，但我宁愿不这样做。

帮帮我，Whosebug。你是我唯一的希望

解决方案编辑：

不是我想要的确切解决方案，但它确实有效（需要做一些轻微的修复）。

echo strip_tags($message, '<body>');

只输出

<body...>
Yayh the text i want!
</body .....>

部分。非常感谢@ThisGuyHasTwoThumbs（在评论中）

编辑：

最后代码大致变成了这样

<?PHP
$user = "email@domain.com";
$password = "password";
$mbox = imap_open("{exchange01:993/imap/ssl/novalidate-cert}", $user, $password);

$message = imap_fetchbody($mbox,1,1);

$message = strip_tags($message, '<body>');
$message = explode(">", $message);
$message = explode("<", $message[1]);
$message = str_replace("&nbsp;", "", $message[0]);
$message = html_entity_decode($message);
$message = trim($message);
//Or the above three combined in one row
#$message = trim(html_entity_decode( str_replace("&nbsp;", "", $message[0])));

echo $message;

if($mbox)
{
    imap_close($mbox);
};
?>

删除第一个 <body something something something> 和末尾的 </body> ，然后删除变量开头和结尾的空格。（@Goose 在下面的编辑答案中也有点回答）。它还将 html 编码的字母转换为相应的字母，并删除标签等。

Answer 1

你要的是strip_tags()

http://php.net/manual/en/function.strip-tags.php

$html = '<div>hello</div>';
$text = strip_tags($html);
echo $text; // hello

如果您需要从生成的字符串中删除多余的白色 space，请使用它。这也将删除新行。感谢 Remove excess whitespace from within a string

$text = preg_replace('/\s+/', ' ', $text);

Answer 2

做 $message = imap_fetchbody($mbox,1,1.1);

将为您提供消息的纯文本部分而不是整个 body 内容，或者如果您想要 html 部分

，请使用 1.2

(空) - 整条消息 0 - 消息 header 1 - MULTIPART/ALTERNATIVE 1.1 - TEXT/PLAIN 1.2 - TEXT/HTML 2 - MESSAGE/RFC822（完整的附加信息） 2.0 - 附加消息 header 2.1 - TEXT/PLAIN 2.2 - TEXT/HTML 2.3 - file.ext

根据 http://php.net/manual/en/function.imap-fetchbody.php 上的第二条评论，它还有一些不错的功能，可以为您动态计算可用的消息部分，因此您不必太担心消息和数据的类型是的。

PHP IMAP 如何只获取正文的文本部分？不是不同的 <html> 标签等

PHP IMAP How to get just the text-part of body? Not the different <html> tags etc

html

php

parsing

imap