用于替换 HTML 中 SQL 服务器的标记元素的正则表达式

Question

在 SQL 服务器数据库 table 中，我有一个这样的列：

<p>Radio and television.</p><p>very popular in the world today.</p><p>Millions of people watch TV. </p><p>That’s because a radio is very small <span_style=":_black;">98.2%</span></p><p>and it‘s easy to carry. <span_style=":_black;">haha100%</span></p>

我想删除 <p> 和 </p> 和 <span_style=":_black;"> 和 </span> 以及 HTML 中的所有标签元素。

我最终想要的文字是这样的：

Radio and television.very popular in the world today.Millions of people watch TV.That’s because a radio is very small 98.2% and it‘s easy to carry.haha100%

我想用正则表达式来做。但是，我找不到解决这个问题的正则表达式。

我该怎么办？

Answer 1

This RegEx 可能会帮助您这样做：

 ((\<)[\w\/\=\x22\x27\:-;]+(>))

您只需添加您可能拥有的任何其他字符，例如 space:

 [\w\/\=\x22\x27\:-;]

您只需使用您的代码将 $1 替换为空字符串即可。
您可能还会考虑您可能拥有的其他基于应用程序的要求和语言特定的元字符转义。
如果愿意，您也可以简化此正则表达式：

关于特殊字符，您可以根据您的 desired language 检查 unicode/ASCII。

您只需将其添加到正则表达式即可。例如，如果您有特殊的 quotation marks, you can update it similar to this RegEx：

((\<)([\w\/\=\"\'\‘\:\’\-;\s]+)(>))

这个正则表达式很容易理解：

它有一个简单的左边界，<，在捕获组中只是为了安全
```
 ((\<)
```
它有一个简单的右边界：>，在捕获组中只是为了安全
```
 (>))
```
它有一个中间捕获组，其中应包含所有字符：
```
 ([\w\/\=\"\'\‘\:\’\-;\s]+) 
```

然后，它把这三个capturing group包在一个group里，其实也不是必须的，为了保险起见，多加了一个boundary。

I do not know about SQL Servers, but this post might help you to maybe design a query to do so.

Answer 2

我认为你在这里不需要正则表达式，尝试使用以下内容：

DECLARE @html nvarchar(MAX) = N'<p>Radio and television.</p><p>very popular in the world today.</p><p>Millions of people watch TV. </p><p>That’s because a radio is very small <span_style=":_black;">98.2%</span></p><p>and it‘s easy to carry. <span_style=":_black;">haha100%</span></p>';

SET @html=REPLACE(@html,'span_style','span style') -- wrong tag

DECLARE @xml xml = @html

-- demo with variable
SELECT t.c.value('.','varchar(max)') AllText
FROM @xml.nodes('/') t(c)

-- demo with query
SELECT (SELECT t.c.value('.','varchar(max)') FROM q.xml_col.nodes('/') t(c)) AllText
FROM
  (
    -- your query with a xml-column is here
    SELECT CAST(@html AS xml) xml_col -- row 1
    UNION ALL
    SELECT CAST(@html AS xml) xml_col -- row 2
  ) q

用于替换 HTML 中 SQL 服务器的标记元素的正则表达式

RegEx for replacing the tag elements in HTML for SQL Server

regex

sql-server

sql-server-2008