如何从字符串中提取 html 结构?
How to extract html structure from string?
我有一篇很长的文字写在html :
<body>
<h2>title 1</h2>
<h2>This is an <b>important</b> title</h2>
Some text
<h3>This a subtitle b</h3>
<h3>This is also <span style="font-weight:500">important</span></h3>
</body>
我需要从中提取标题以创建 Table 内容。我希望结果为:
h2 Title 1
h2 This is an <b>important</b> title
h3 This a subtitle b
h3 This is also <span style="font-weight:500">important</span>
或
h2 Title 1
h2 This is an important title
h3 This a subtitle b
h3 This is also important
我试过了
select * from xmltable('body/*' passing xmltype('<body><h2>title 1</h2><h2>This is an <b>important</b> title</h2>Some text<h3>This a subtitle b</h3><h3>This is also <span style="font-weight:500">important</span></h3></body>')
columns
tag_name varchar2(1000) path 'name()',
tag_value varchar2(1000) path 'text()')
where tag_name in ('h1','h2','h3','h4','h5')
但我收到错误消息:
ORA-19279: XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence
19279. 00000 - "XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence"
*Cause: The XQuery sequence passed in had more than one item.
*Action: Correct the XQuery expression to return a single item sequence.
请问有人知道怎么解决吗?
谢谢。
这可以作为输出吗?
TAG_NAME TAG_VALUE
-------- ----------------------------------------------------------------------
h2 <h2>title 1</h2>
h2 <h2>This is an <b>important</b> title</h2>
h3 <h3>This a subtitle b</h3>
h3 <h3>This is also <span style="font-weight:500">important</span></h3>
这对我来说更有意义 - 然后让您使用的任何 xml 工具根据需要解释标签值。 (他们可能要求标签值是 xmltype
数据类型——如果是这样,那么只需删除 select
子句中的 xmlserialize
包装器。)
如果可以接受,您只需稍微修改一下查询即可获得。
select tag_name, xmlserialize(document tag_value) as tag_value
from xmltable('body/*' passing xmltype('<body><h2>title 1</h2>
<h2>This is an <b>important</b> title</h2>Some text<h3>This a subtitle b</h3>
<h3>This is also <span style="font-weight:500">important</span></h3></body>')
columns
tag_name varchar2(1000) path 'name()',
tag_value xmltype path '.')
where tag_name in ('h1','h2','h3','h4','h5')
;
我有一篇很长的文字写在html :
<body>
<h2>title 1</h2>
<h2>This is an <b>important</b> title</h2>
Some text
<h3>This a subtitle b</h3>
<h3>This is also <span style="font-weight:500">important</span></h3>
</body>
我需要从中提取标题以创建 Table 内容。我希望结果为:
h2 Title 1
h2 This is an <b>important</b> title
h3 This a subtitle b
h3 This is also <span style="font-weight:500">important</span>
或
h2 Title 1
h2 This is an important title
h3 This a subtitle b
h3 This is also important
我试过了
select * from xmltable('body/*' passing xmltype('<body><h2>title 1</h2><h2>This is an <b>important</b> title</h2>Some text<h3>This a subtitle b</h3><h3>This is also <span style="font-weight:500">important</span></h3></body>')
columns
tag_name varchar2(1000) path 'name()',
tag_value varchar2(1000) path 'text()')
where tag_name in ('h1','h2','h3','h4','h5')
但我收到错误消息:
ORA-19279: XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence
19279. 00000 - "XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence"
*Cause: The XQuery sequence passed in had more than one item.
*Action: Correct the XQuery expression to return a single item sequence.
请问有人知道怎么解决吗?
谢谢。
这可以作为输出吗?
TAG_NAME TAG_VALUE
-------- ----------------------------------------------------------------------
h2 <h2>title 1</h2>
h2 <h2>This is an <b>important</b> title</h2>
h3 <h3>This a subtitle b</h3>
h3 <h3>This is also <span style="font-weight:500">important</span></h3>
这对我来说更有意义 - 然后让您使用的任何 xml 工具根据需要解释标签值。 (他们可能要求标签值是 xmltype
数据类型——如果是这样,那么只需删除 select
子句中的 xmlserialize
包装器。)
如果可以接受,您只需稍微修改一下查询即可获得。
select tag_name, xmlserialize(document tag_value) as tag_value
from xmltable('body/*' passing xmltype('<body><h2>title 1</h2>
<h2>This is an <b>important</b> title</h2>Some text<h3>This a subtitle b</h3>
<h3>This is also <span style="font-weight:500">important</span></h3></body>')
columns
tag_name varchar2(1000) path 'name()',
tag_value xmltype path '.')
where tag_name in ('h1','h2','h3','h4','h5')
;