如何在 table 中展开制表符键值对文本格式文件,其中键是列,值是单元格的数据
How to flatten out tab key value pair text format file in a table where key is column and value is data for the cell
我有一个文本文件,其中 1 行在键值对中拆分为多行。数据如下所示:
1,800001348
2,理想选择
27,服务地点
39,理想选择
400,123 主要街道
400,Ste G
410,西雅图
420,华盛顿
423,本顿
430,99336
整个块再次重复:
1,850000900
2,INVITAE 公司
27,服务地点
39,INVITAE 公司
400,XYZ 第一大道
410,旧金山
420,加州
423,旧金山
430,94103
我已使用 SQL 加载程序将此文件加载到 Oracle 中。完整性得以保持,因为我在所有行上附加了一个序列号,所以我可以逐行遍历 table 并告诉第一行从哪里开始和结束。
键值序列号
1 800001348 1
2 理想选择 2
27 服务地点 3
39 理想选择 4
400 123 大街 5
400 Ste G 6
410 肯纳威克 7
420 华盛顿 8
423 本顿 9
430 99336 10
1 850000900 11
2 邀请公司 12
27 服务地点 13
39 邀请公司 14
400 XYZ 第一大道 15
410 旧金山 16
420 加利福尼亚州 17
423 旧金山 18
430 94103 19
select
case when KEY = '1' then value else null end as FACILITY_ID,
case when KEY = '2' then value else null end as Unknown_num,
case when KEY = '27' then value else null end as TYPE_OF_LOCATION,
case when KEY = '39' then value else null end as EXTERNAL_NAME,
case when KEY = '400' then value else null end as ADDRESS,
case when KEY = '410' then value else null end as CITY,
case when KEY = '420' then value else null end as STATE,
case when KEY = '423' then value else null end as COUNTY,
case when KEY = '430' then value else null end as ZIP_CODE,
value,
SEQNUM from MDM_ODS.EAF_EPIC_IMPORT order by SEQNUM;
我得到了转置结果,但正如预期的那样,它们都在不同的行上并且有很多空值,有什么方法可以将它们组合成一行?
FACILITY_ID UNKNOWN_NUM TYPE_OF_LOCATION EXTERNAL_NAME 地址城市
800001348
理想选择
服务地点
理想选择
西盖奇大道 8514 号
Ste G
肯纳威克
类似这样的方法可行:
SELECT facility_id, unknown_num, type_of_location, external_name,
address, city, state, county, zip_code
FROM (
SELECT key,
value facility_id,
LEAD(value, 1) OVER (ORDER BY seqnum) unknown_num,
LEAD(value, 2) OVER (ORDER BY seqnum) type_of_location,
LEAD(value, 3) OVER (ORDER BY seqnum) external_name,
LEAD(value, 4) OVER (ORDER BY seqnum) address,
LEAD(value, 5) OVER (ORDER BY seqnum) city,
LEAD(value, 6) OVER (ORDER BY seqnum) state,
LEAD(value, 7) OVER (ORDER BY seqnum) county,
LEAD(value, 8) OVER (ORDER BY seqnum) zip_code
FROM MDM_ODS.EAF_EPIC_IMPORT
ORDER BY seqnum)
WHERE key=1;
LEAD(X, N) OVER (ORDER BY <sort-order>)
表示 return 列 "X" 的值,即当前行之前的 "N" 行数,行按 <sort-order>
排序.
试试下面的代码:
-- MANUAL DATA CREATION
WITH DATAA AS (
SELECT
1 KEY,
'800001348' VALUE,
1 SEQNUM
FROM
DUAL
UNION ALL
SELECT
2,
'IDEAL OPTION',
2
FROM
DUAL
UNION ALL
SELECT
27,
'Place of Service',
3
FROM
DUAL
UNION ALL
SELECT
39,
'IDEAL OPTION',
4
FROM
DUAL
UNION ALL
SELECT
400,
'123 MAIN STREET',
5
FROM
DUAL
UNION ALL
SELECT
400,
'Ste G',
6
FROM
DUAL
UNION ALL
SELECT
410,
'SEATTLE',
7
FROM
DUAL
UNION ALL
SELECT
420,
'Washington',
8
FROM
DUAL
UNION ALL
SELECT
423,
'BENTON',
9
FROM
DUAL
UNION ALL
SELECT
430,
'99336',
10
FROM
DUAL
--
--
UNION ALL
--
--
SELECT
1 KEY,
'850000900' VALUE,
11 SEQNUM
FROM
DUAL
UNION ALL
SELECT
2,
'INVITAE CORPORATION',
12
FROM
DUAL
UNION ALL
SELECT
27,
'Place of Service',
13
FROM
DUAL
UNION ALL
SELECT
39,
'INVITAE CORPORATION',
14
FROM
DUAL
UNION ALL
SELECT
400,
'XYZ 1st AVENUE',
15
FROM
DUAL
UNION ALL
SELECT
410,
'SAN FRANCISCO',
16
FROM
DUAL
UNION ALL
SELECT
420,
'California',
17
FROM
DUAL
UNION ALL
SELECT
423,
'SAN FRANCISCO',
18
FROM
DUAL
UNION ALL
SELECT
430,
'94103',
19
FROM
DUAL
)
--
-- YOUR QUERY STARTS FROM HERE
--
SELECT
MAX(CASE
WHEN KEY = '1' THEN VALUE
END) AS FACILITY_ID,
MAX(CASE
WHEN KEY = '2' THEN VALUE
END) AS UNKNOWN_NUM,
MAX(CASE
WHEN KEY = '27' THEN VALUE
END) AS TYPE_OF_LOCATION,
MAX(CASE
WHEN KEY = '39' THEN VALUE
END) AS EXTERNAL_NAME,
TRIM(',' FROM
LISTAGG(CASE
WHEN KEY = '400' THEN VALUE
END, ',') WITHIN GROUP(
ORDER BY
SEQNUM
)
) AS ADDRESS, -- ADDRESS HAS MORE THAN ONE RECORD IN THE FIRST GROUP OF VALUES
MAX(CASE
WHEN KEY = '410' THEN VALUE
END) AS CITY,
MAX(CASE
WHEN KEY = '420' THEN VALUE
END) AS STATE,
MAX(CASE
WHEN KEY = '423' THEN VALUE
END) AS COUNTY,
MAX(CASE
WHEN KEY = '430' THEN VALUE
END) AS ZIP_CODE
FROM
(
SELECT
DATAA_ALL.KEY,
DATAA_ALL.VALUE,
DATAA_ALL.SEQNUM,
COUNT(1) AS GRP_VAR
FROM
DATAA DATAA_ALL
JOIN DATAA DATAA_FIRST ON ( DATAA_FIRST.KEY = 1
AND DATAA_ALL.SEQNUM >= DATAA_FIRST.SEQNUM )
GROUP BY
DATAA_ALL.KEY,
DATAA_ALL.VALUE,
DATAA_ALL.SEQNUM
)
GROUP BY
GRP_VAR
输出
我用GRP_VAR
分了组,然后用了组函数。
此外,只要有可能有多个相同属性的记录(如上例中的地址),您可以使用 LISTAGG
而不是 MAX
。
干杯!!
如果我理解正确的话,你有两个问题:
- 将 value-pairs 分组 "record"..
- 将行转为列
我已经用这样的 table 创建了一个 SQL Fiddle,并且我添加了您的样本数据。
create table your_table_t(
k number
,v varchar2(200)
,seq number
);
对于第一个问题,我假设 key = 1
是您的记录 ID。因此,目标是将该 ID 分配给属于该记录的每个 value-pair。为此,我使用了一个合并表达式,它将 key=1 的值复制到每个 value-pair.
对于第二部分,我只是使用 Oracle 的 PIVOT
运算符将行转换为列。
with identify_record as(
select k,v
,coalesce( decode(k,1,v,null)
,lag(decode(k,1,v,null),1) ignore nulls over(order by seq)) as id
from your_table_t a
)
select *
from identify_record pivot(
max(v) for k in(1 as FACILITY_ID
,2 as Unknown_num
,27 as TYPE_OF_LOCATION
,39 as EXTERNAL_NAME
,400 as ADDRESS
,410 as CITY
,420 as STATE
,423 as COUNTY
,430 as ZIP_CODE
)
);
您将得到以下输出:
ID FACILITY_ID UNKNOWN_NUM TYPE_OF_LOCATION EXTERNAL_NAME ADDRESS CITY STATE COUNTY ZIP_CODE
"800001348" "800001348" "IDEAL OPTION" "Place of Service" "IDEAL OPTION" "Ste G" "KENNEWICK" "Washington" "BENTON" "99336"
"850000900" "850000900" "INVITAE CORPORATION" "Place of Service" "INVITAE CORPORATION" "XYZ 1st AVENUE" "SAN FRANCISCO" "California" "SAN FRANCISCO" "94103"
我有一个文本文件,其中 1 行在键值对中拆分为多行。数据如下所示:
1,800001348
2,理想选择
27,服务地点
39,理想选择
400,123 主要街道
400,Ste G
410,西雅图
420,华盛顿
423,本顿
430,99336
整个块再次重复: 1,850000900
2,INVITAE 公司
27,服务地点
39,INVITAE 公司
400,XYZ 第一大道
410,旧金山
420,加州
423,旧金山
430,94103
我已使用 SQL 加载程序将此文件加载到 Oracle 中。完整性得以保持,因为我在所有行上附加了一个序列号,所以我可以逐行遍历 table 并告诉第一行从哪里开始和结束。
键值序列号
1 800001348 1
2 理想选择 2
27 服务地点 3
39 理想选择 4
400 123 大街 5
400 Ste G 6
410 肯纳威克 7
420 华盛顿 8
423 本顿 9
430 99336 10
1 850000900 11
2 邀请公司 12
27 服务地点 13
39 邀请公司 14
400 XYZ 第一大道 15
410 旧金山 16
420 加利福尼亚州 17
423 旧金山 18
430 94103 19
select
case when KEY = '1' then value else null end as FACILITY_ID,
case when KEY = '2' then value else null end as Unknown_num,
case when KEY = '27' then value else null end as TYPE_OF_LOCATION,
case when KEY = '39' then value else null end as EXTERNAL_NAME,
case when KEY = '400' then value else null end as ADDRESS,
case when KEY = '410' then value else null end as CITY,
case when KEY = '420' then value else null end as STATE,
case when KEY = '423' then value else null end as COUNTY,
case when KEY = '430' then value else null end as ZIP_CODE,
value,
SEQNUM from MDM_ODS.EAF_EPIC_IMPORT order by SEQNUM;
我得到了转置结果,但正如预期的那样,它们都在不同的行上并且有很多空值,有什么方法可以将它们组合成一行?
FACILITY_ID UNKNOWN_NUM TYPE_OF_LOCATION EXTERNAL_NAME 地址城市
800001348
理想选择
服务地点
理想选择
西盖奇大道 8514 号
Ste G
肯纳威克
类似这样的方法可行:
SELECT facility_id, unknown_num, type_of_location, external_name,
address, city, state, county, zip_code
FROM (
SELECT key,
value facility_id,
LEAD(value, 1) OVER (ORDER BY seqnum) unknown_num,
LEAD(value, 2) OVER (ORDER BY seqnum) type_of_location,
LEAD(value, 3) OVER (ORDER BY seqnum) external_name,
LEAD(value, 4) OVER (ORDER BY seqnum) address,
LEAD(value, 5) OVER (ORDER BY seqnum) city,
LEAD(value, 6) OVER (ORDER BY seqnum) state,
LEAD(value, 7) OVER (ORDER BY seqnum) county,
LEAD(value, 8) OVER (ORDER BY seqnum) zip_code
FROM MDM_ODS.EAF_EPIC_IMPORT
ORDER BY seqnum)
WHERE key=1;
LEAD(X, N) OVER (ORDER BY <sort-order>)
表示 return 列 "X" 的值,即当前行之前的 "N" 行数,行按 <sort-order>
排序.
试试下面的代码:
-- MANUAL DATA CREATION
WITH DATAA AS (
SELECT
1 KEY,
'800001348' VALUE,
1 SEQNUM
FROM
DUAL
UNION ALL
SELECT
2,
'IDEAL OPTION',
2
FROM
DUAL
UNION ALL
SELECT
27,
'Place of Service',
3
FROM
DUAL
UNION ALL
SELECT
39,
'IDEAL OPTION',
4
FROM
DUAL
UNION ALL
SELECT
400,
'123 MAIN STREET',
5
FROM
DUAL
UNION ALL
SELECT
400,
'Ste G',
6
FROM
DUAL
UNION ALL
SELECT
410,
'SEATTLE',
7
FROM
DUAL
UNION ALL
SELECT
420,
'Washington',
8
FROM
DUAL
UNION ALL
SELECT
423,
'BENTON',
9
FROM
DUAL
UNION ALL
SELECT
430,
'99336',
10
FROM
DUAL
--
--
UNION ALL
--
--
SELECT
1 KEY,
'850000900' VALUE,
11 SEQNUM
FROM
DUAL
UNION ALL
SELECT
2,
'INVITAE CORPORATION',
12
FROM
DUAL
UNION ALL
SELECT
27,
'Place of Service',
13
FROM
DUAL
UNION ALL
SELECT
39,
'INVITAE CORPORATION',
14
FROM
DUAL
UNION ALL
SELECT
400,
'XYZ 1st AVENUE',
15
FROM
DUAL
UNION ALL
SELECT
410,
'SAN FRANCISCO',
16
FROM
DUAL
UNION ALL
SELECT
420,
'California',
17
FROM
DUAL
UNION ALL
SELECT
423,
'SAN FRANCISCO',
18
FROM
DUAL
UNION ALL
SELECT
430,
'94103',
19
FROM
DUAL
)
--
-- YOUR QUERY STARTS FROM HERE
--
SELECT
MAX(CASE
WHEN KEY = '1' THEN VALUE
END) AS FACILITY_ID,
MAX(CASE
WHEN KEY = '2' THEN VALUE
END) AS UNKNOWN_NUM,
MAX(CASE
WHEN KEY = '27' THEN VALUE
END) AS TYPE_OF_LOCATION,
MAX(CASE
WHEN KEY = '39' THEN VALUE
END) AS EXTERNAL_NAME,
TRIM(',' FROM
LISTAGG(CASE
WHEN KEY = '400' THEN VALUE
END, ',') WITHIN GROUP(
ORDER BY
SEQNUM
)
) AS ADDRESS, -- ADDRESS HAS MORE THAN ONE RECORD IN THE FIRST GROUP OF VALUES
MAX(CASE
WHEN KEY = '410' THEN VALUE
END) AS CITY,
MAX(CASE
WHEN KEY = '420' THEN VALUE
END) AS STATE,
MAX(CASE
WHEN KEY = '423' THEN VALUE
END) AS COUNTY,
MAX(CASE
WHEN KEY = '430' THEN VALUE
END) AS ZIP_CODE
FROM
(
SELECT
DATAA_ALL.KEY,
DATAA_ALL.VALUE,
DATAA_ALL.SEQNUM,
COUNT(1) AS GRP_VAR
FROM
DATAA DATAA_ALL
JOIN DATAA DATAA_FIRST ON ( DATAA_FIRST.KEY = 1
AND DATAA_ALL.SEQNUM >= DATAA_FIRST.SEQNUM )
GROUP BY
DATAA_ALL.KEY,
DATAA_ALL.VALUE,
DATAA_ALL.SEQNUM
)
GROUP BY
GRP_VAR
输出
我用GRP_VAR
分了组,然后用了组函数。
此外,只要有可能有多个相同属性的记录(如上例中的地址),您可以使用 LISTAGG
而不是 MAX
。
干杯!!
如果我理解正确的话,你有两个问题:
- 将 value-pairs 分组 "record"..
- 将行转为列
我已经用这样的 table 创建了一个 SQL Fiddle,并且我添加了您的样本数据。
create table your_table_t(
k number
,v varchar2(200)
,seq number
);
对于第一个问题,我假设 key = 1
是您的记录 ID。因此,目标是将该 ID 分配给属于该记录的每个 value-pair。为此,我使用了一个合并表达式,它将 key=1 的值复制到每个 value-pair.
对于第二部分,我只是使用 Oracle 的 PIVOT
运算符将行转换为列。
with identify_record as(
select k,v
,coalesce( decode(k,1,v,null)
,lag(decode(k,1,v,null),1) ignore nulls over(order by seq)) as id
from your_table_t a
)
select *
from identify_record pivot(
max(v) for k in(1 as FACILITY_ID
,2 as Unknown_num
,27 as TYPE_OF_LOCATION
,39 as EXTERNAL_NAME
,400 as ADDRESS
,410 as CITY
,420 as STATE
,423 as COUNTY
,430 as ZIP_CODE
)
);
您将得到以下输出:
ID FACILITY_ID UNKNOWN_NUM TYPE_OF_LOCATION EXTERNAL_NAME ADDRESS CITY STATE COUNTY ZIP_CODE
"800001348" "800001348" "IDEAL OPTION" "Place of Service" "IDEAL OPTION" "Ste G" "KENNEWICK" "Washington" "BENTON" "99336"
"850000900" "850000900" "INVITAE CORPORATION" "Place of Service" "INVITAE CORPORATION" "XYZ 1st AVENUE" "SAN FRANCISCO" "California" "SAN FRANCISCO" "94103"