如何在 table 中展开制表符键值对文本格式文件,其中键是列,值是单元格的数据

How to flatten out tab key value pair text format file in a table where key is column and value is data for the cell

我有一个文本文件,其中 1 行在键值对中拆分为多行。数据如下所示:

1,800001348

2,理想选择

27,服务地点

39,理想选择

400,123 主要街道

400,Ste G

410,西雅图

420,华盛顿

423,本顿

430,99336

整个块再次重复: 1,850000900

2,INVITAE 公司

27,服务地点

39,INVITAE 公司

400,XYZ 第一大道

410,旧金山

420,加州

423,旧金山

430,94103

我已使用 SQL 加载程序将此文件加载到 Oracle 中。完整性得以保持,因为我在所有行上附加了一个序列号,所以我可以逐行遍历 table 并告诉第一行从哪里开始和结束。

键值序列号

1 800001348 1

2 理想选择 2

27 服务地点 3

39 理想选择 4

400 123 大街 5

400 Ste G 6

410 肯纳威克 7

420 华盛顿 8

423 本顿 9

430 99336 10

1 850000900 11

2 邀请公司 12

27 服务地点 13

39 邀请公司 14

400 XYZ 第一大道 15

410 旧金山 16

420 加利福尼亚州 17

423 旧金山 18

430 94103 19

select 
case when KEY = '1' then value else null end as FACILITY_ID,
case when KEY = '2' then value else null end as  Unknown_num,
case when KEY = '27' then value else null end as  TYPE_OF_LOCATION,
case when KEY = '39' then value else null end as  EXTERNAL_NAME,
case when KEY = '400' then value else null end as  ADDRESS,
case when KEY = '410' then value else null end as  CITY,
case when KEY = '420' then value else null end as  STATE,
case when KEY = '423' then value else null end as  COUNTY,
case when KEY = '430' then value else null end as  ZIP_CODE,
value,
SEQNUM from MDM_ODS.EAF_EPIC_IMPORT order by SEQNUM;

我得到了转置结果,但正如预期的那样,它们都在不同的行上并且有很多空值,有什么方法可以将它们组合成一行?

FACILITY_ID UNKNOWN_NUM TYPE_OF_LOCATION EXTERNAL_NAME 地址城市 800001348
理想选择
服务地点
理想选择
西盖奇大道 8514 号
Ste G 肯纳威克

类似这样的方法可行:

SELECT facility_id, unknown_num, type_of_location, external_name,
       address, city, state, county, zip_code
   FROM (
  SELECT key,
         value facility_id,
         LEAD(value, 1)  OVER (ORDER BY seqnum) unknown_num,
         LEAD(value, 2)  OVER (ORDER BY seqnum) type_of_location,
         LEAD(value, 3)  OVER (ORDER BY seqnum) external_name,
         LEAD(value, 4)  OVER (ORDER BY seqnum) address,
         LEAD(value, 5)  OVER (ORDER BY seqnum) city,
         LEAD(value, 6)  OVER (ORDER BY seqnum) state,
         LEAD(value, 7)  OVER (ORDER BY seqnum) county,
         LEAD(value, 8)  OVER (ORDER BY seqnum) zip_code
    FROM MDM_ODS.EAF_EPIC_IMPORT
    ORDER BY seqnum)
  WHERE key=1;

LEAD(X, N) OVER (ORDER BY <sort-order>) 表示 return 列 "X" 的值,即当前行之前的 "N" 行数,行按 <sort-order> 排序.

试试下面的代码:

-- MANUAL DATA CREATION
WITH DATAA AS (
    SELECT
        1 KEY,
        '800001348' VALUE,
        1 SEQNUM
    FROM
        DUAL
    UNION ALL
    SELECT
        2,
        'IDEAL OPTION',
        2
    FROM
        DUAL
    UNION ALL
    SELECT
        27,
        'Place of Service',
        3
    FROM
        DUAL
    UNION ALL
    SELECT
        39,
        'IDEAL OPTION',
        4
    FROM
        DUAL
    UNION ALL
    SELECT
        400,
        '123 MAIN STREET',
        5
    FROM
        DUAL
    UNION ALL
    SELECT
        400,
        'Ste G',
        6
    FROM
        DUAL
    UNION ALL
    SELECT
        410,
        'SEATTLE',
        7
    FROM
        DUAL
    UNION ALL
    SELECT
        420,
        'Washington',
        8
    FROM
        DUAL
    UNION ALL
    SELECT
        423,
        'BENTON',
        9
    FROM
        DUAL
    UNION ALL
    SELECT
        430,
        '99336',
        10
    FROM
        DUAL
--
--
    UNION ALL
--
--
    SELECT
        1 KEY,
        '850000900' VALUE,
        11 SEQNUM
    FROM
        DUAL
    UNION ALL
    SELECT
        2,
        'INVITAE CORPORATION',
        12
    FROM
        DUAL
    UNION ALL
    SELECT
        27,
        'Place of Service',
        13
    FROM
        DUAL
    UNION ALL
    SELECT
        39,
        'INVITAE CORPORATION',
        14
    FROM
        DUAL
    UNION ALL
    SELECT
        400,
        'XYZ 1st AVENUE',
        15
    FROM
        DUAL
    UNION ALL
    SELECT
        410,
        'SAN FRANCISCO',
        16
    FROM
        DUAL
    UNION ALL
    SELECT
        420,
        'California',
        17
    FROM
        DUAL
    UNION ALL
    SELECT
        423,
        'SAN FRANCISCO',
        18
    FROM
        DUAL
    UNION ALL
    SELECT
        430,
        '94103',
        19
    FROM
        DUAL
)
--
-- YOUR QUERY STARTS FROM HERE
--
SELECT
    MAX(CASE
        WHEN KEY = '1' THEN VALUE
    END) AS FACILITY_ID,
    MAX(CASE
        WHEN KEY = '2' THEN VALUE
    END) AS UNKNOWN_NUM,
    MAX(CASE
        WHEN KEY = '27' THEN VALUE
    END) AS TYPE_OF_LOCATION,
    MAX(CASE
        WHEN KEY = '39' THEN VALUE
    END) AS EXTERNAL_NAME,
    TRIM(',' FROM
        LISTAGG(CASE
            WHEN KEY = '400' THEN VALUE
        END, ',') WITHIN GROUP(
            ORDER BY
                SEQNUM
        )
    ) AS ADDRESS, -- ADDRESS HAS MORE THAN ONE RECORD IN THE FIRST GROUP OF VALUES
    MAX(CASE
        WHEN KEY = '410' THEN VALUE
    END) AS CITY,
    MAX(CASE
        WHEN KEY = '420' THEN VALUE
    END) AS STATE,
    MAX(CASE
        WHEN KEY = '423' THEN VALUE
    END) AS COUNTY,
    MAX(CASE
        WHEN KEY = '430' THEN VALUE
    END) AS ZIP_CODE
FROM
    (
        SELECT
            DATAA_ALL.KEY,
            DATAA_ALL.VALUE,
            DATAA_ALL.SEQNUM,
            COUNT(1) AS GRP_VAR
        FROM
            DATAA DATAA_ALL
            JOIN DATAA DATAA_FIRST ON ( DATAA_FIRST.KEY = 1
                                        AND DATAA_ALL.SEQNUM >= DATAA_FIRST.SEQNUM )
        GROUP BY
            DATAA_ALL.KEY,
            DATAA_ALL.VALUE,
            DATAA_ALL.SEQNUM
    )
GROUP BY
    GRP_VAR

输出

我用GRP_VAR分了组,然后用了组函数。 此外,只要有可能有多个相同属性的记录(如上例中的地址),您可以使用 LISTAGG 而不是 MAX

干杯!!

如果我理解正确的话,你有两个问题:

  1. 将 value-pairs 分组 "record"..
  2. 将行转为列

我已经用这样的 table 创建了一个 SQL Fiddle,并且我添加了您的样本数据。

create table your_table_t(
   k    number
  ,v    varchar2(200)
  ,seq number
);

对于第一个问题,我假设 key = 1 是您的记录 ID。因此,目标是将该 ID 分配给属于该记录的每个 value-pair。为此,我使用了一个合并表达式,它将 key=1 的值复制到每个 value-pair.

对于第二部分,我只是使用 Oracle 的 PIVOT 运算符将行转换为列。

with identify_record as(
   select k,v
            ,coalesce(    decode(k,1,v,null)
                     ,lag(decode(k,1,v,null),1) ignore nulls over(order by seq)) as id
        from your_table_t a
)
select *    
  from identify_record pivot(
         max(v) for k in(1   as FACILITY_ID
                        ,2   as Unknown_num
                        ,27  as TYPE_OF_LOCATION
                        ,39  as EXTERNAL_NAME
                        ,400 as ADDRESS
                        ,410 as CITY
                        ,420 as STATE
                        ,423 as COUNTY
                        ,430 as ZIP_CODE                        
                        )
  );

您将得到以下输出:

ID          FACILITY_ID    UNKNOWN_NUM             TYPE_OF_LOCATION     EXTERNAL_NAME           ADDRESS           CITY              STATE          COUNTY            ZIP_CODE
"800001348" "800001348"    "IDEAL OPTION"          "Place of Service"   "IDEAL OPTION"          "Ste G"           "KENNEWICK"       "Washington"   "BENTON"          "99336"
"850000900" "850000900"    "INVITAE CORPORATION"   "Place of Service"   "INVITAE CORPORATION"   "XYZ 1st AVENUE"  "SAN FRANCISCO"   "California"   "SAN FRANCISCO"   "94103"