如何删除内联样式中的空格？

Question

我有一个 php 脚本可以生成 html 电子邮件。为了优化大小不违反 Google 的 102kB 限制，我试图尽可能地从代码中挤出不必要的字符。

我目前使用 Emogrifier to inline the css and then TinyMinify 缩小。

此输出在内联样式中的属性和值之间仍然有空格（例如style="color: #ffffff; font-weight: 16px"）

我开发了以下正则表达式来删除多余的空格，但它也会影响实际内容（例如，this & that 变成 this &that）

$out = preg_replace("/(;|:)\s([a-zA-Z0-9#])/", "", $newsletter);

如何修改此正则表达式以使其仅限于内联样式，或者有更好的方法吗？

Answer 1

不存在不匹配有效负载（style="" 可以出现在任何地方）和不匹配实际 CSS 值（如 content: 'a: b'）的万无一失的方法。此外还考虑

缩短值：red 比 #f00 短，比 #ff0000
删除开头和结尾的伪造内容，例如空格和分号
重新设计您的 HTML：即使用 <ins> 和 <strong> 比使用内联 CSS

一种方法是先匹配所有内联样式 HTML 属性，然后仅对它们的内容进行操作，但您必须自己测试一下效果如何：

$out= preg_replace_callback
( '/( style=")([^"]*)("[ >])/'  // Find all appropriate HTML attributes
, function( $aMatch ) {  // Per match
    // Kill any amount of any kind of spaces after colon or semicolon only
    $sInner= preg_replace
    ( '/([;:])\s*([a-zA-Z0-9#])/'  // Escaping backslash in PHP string context
    , ''
    , $aMatch[2]  // Second sub match
    );

    // Kill any amount of leading and trailing semicolons and/or spaces
    $sInner= preg_replace
    ( array( '/^\s*;*\s*/', '/\s*;*\s*$/' )
    , ''
    , $sInner
    );

    return $aMatch[1]. $sInner. $aMatch[3];  // New HTML attribute
  }
, $newsletter
);

Answer 2

您没有提供示例输入供我们使用，但您提到您正在处理 html。这应该敲响警钟，使用正则表达式作为直接解决方案是 ill-advised。当打算处理有效的 html 时，您应该使用 dom 解析器来隔离样式属性。

为什么不应该使用正则表达式来隔离内联样式声明？ 简单地说：正则表达式是“dom-unaware”。它不知道什么时候在标签内部或外部（我会在我的演示中提供一个人为的猴子扳手来表达这个漏洞。此外，使用 dom 解析器将增加正确处理不同类型引用的好处。虽然可以将正则表达式写入 match/acknowledge 平衡引用，但它会增加相当大的膨胀（如果执行得当）和损害脚本的可读性和可维护性。

在我的演示中，我将展示如何在隔离真正的内联样式声明后 simply/accurately 清除冒号、分号和逗号之后的 space 秒。我已经走得更远了一点（因为本页提到了彩色十六进制代码压缩）来展示如何使用正则表达式将一些六个字符的十六进制代码减少到三个字符。

代码：(Demo)

$html = <<<HTML
<div style='font-family: "Times New Roman", Georgia, serif; background-color: #ffffff; '>
  <p>Some text 
    <span class="ohyeah" style="font-weight: bold; color: #ff6633 !important; border: solid 1px grey;">
      Monkeywrench: style="padding: 3px;"
    </span>
    &
    <strong style="text-decoration: underline; ">Underlined</strong>
  </p>
  <h1 style="margin: 1px 2px 3px 4px;">Heading</h1>
  <span style="background-image:     url('images/not_a_hexcode_ffffff.png');    ">Text</span>
</div>
HTML;

$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('*') as $node) {
    $style = $node->getAttribute('style');
    if ($style) {
        $patterns = ['~[:;,]\K\s+~', '~#\K([\da-f])([\da-f])([\da-f])~i'];
        $replaces = ['', ''];
        $node->setAttribute('style', preg_replace($patterns, $replaces, $style));
    }
}
$html = $dom->saveHtml();
echo $html;

输出：

<div style='font-family:"Times New Roman",Georgia,serif;background-color:#fff;'>
  <p>Some text 
    <span class="ohyeah" style="font-weight:bold;color:#f63 !important;border:solid 1px grey;">
      Monkeywrench: style="padding: 3px;"
    </span>
    &amp;
    <strong style="text-decoration:underline;">Underlined</strong>
  </p>
  <h1 style="margin:1px 2px 3px 4px;">Heading</h1>
  <span style="background-image:url('images/not_a_hexcode_ffffff.png');">Text</span>
</div>

以上代码段在模式中使用 \K 以避免使用环视和过多的捕获组。

我不是在编写删除 !important 之前的 space 的模式，因为我读过一些（不是最近的）帖子，有些浏览器在没有 space 的情况下表现出错误行为。

如何删除内联样式中的空格？

How to remove whitespace in inline styles?

php

regex

html-email