使用 Teradata 正则表达式标记键和值
Tag key & value using Teradata Regular Expression
我有一个类似于以下内容的 TERADATA 数据集:
'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor Time: 15:25:23 04/06/2020'
我想根据键从上面提取标签值。
例如:
with comm as
(
select 'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor' as text
)
select regexp_substr(comm.text,'[^: ]+',1,4)
from comm where regexp_substr(comm.text,'[^: ]+',1,3) = 'IssueType';
有没有一种无需更改每个标签的位置参数即可进行查询的方法。
我还发现最后一个字段对于日期和时间字段有点棘手。
感谢任何帮助。
谢谢。
NVP
函数可以访问 Name/Value-pair 数据,但要拆分成多行,您需要 strtok_split_to_table
或 regexp_split_to_table
。你的情况中棘手的部分是定界符,如果它们是唯一的而不是 ' '
和 ':'
:
会更容易
WITH comm AS
(
SELECT 1 as keycol, -- should be a key column in your table, either numeric or varchar
'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor Time: 15:25:23 04/06/2020' AS text
)
SELECT id, tokennum, token,
-- get the key
StrTok(token,':', 1) AS "Key",
-- get the value (can't use StrTok because of ':' delimiter)
Substring(token From Position(': ' IN token)+2) AS "Value"
FROM TABLE
( RegExp_Split_To_Table(comm.keycol
,comm.text
,'( )(?=[^ ]+: )' -- assuming names don't contain spaces: split at the last space before ': '
, 'c')
RETURNS (id INT , tokennum INTEGER, token VARCHAR(1000) CHARACTER SET Latin)) AS dt
我有一个类似于以下内容的 TERADATA 数据集:
'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor Time: 15:25:23 04/06/2020'
我想根据键从上面提取标签值。
例如:
with comm as
(
select 'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor' as text
)
select regexp_substr(comm.text,'[^: ]+',1,4)
from comm where regexp_substr(comm.text,'[^: ]+',1,3) = 'IssueType';
有没有一种无需更改每个标签的位置参数即可进行查询的方法。 我还发现最后一个字段对于日期和时间字段有点棘手。
感谢任何帮助。
谢谢。
NVP
函数可以访问 Name/Value-pair 数据,但要拆分成多行,您需要 strtok_split_to_table
或 regexp_split_to_table
。你的情况中棘手的部分是定界符,如果它们是唯一的而不是 ' '
和 ':'
:
WITH comm AS
(
SELECT 1 as keycol, -- should be a key column in your table, either numeric or varchar
'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor Time: 15:25:23 04/06/2020' AS text
)
SELECT id, tokennum, token,
-- get the key
StrTok(token,':', 1) AS "Key",
-- get the value (can't use StrTok because of ':' delimiter)
Substring(token From Position(': ' IN token)+2) AS "Value"
FROM TABLE
( RegExp_Split_To_Table(comm.keycol
,comm.text
,'( )(?=[^ ]+: )' -- assuming names don't contain spaces: split at the last space before ': '
, 'c')
RETURNS (id INT , tokennum INTEGER, token VARCHAR(1000) CHARACTER SET Latin)) AS dt