CSV 到列、与基于行的数据连接、分析和输出 - 可以高效地完成吗?

CSV to columns, join with row-based data, analyze and output - can it be done efficiently?

我遇到了一个复杂的 SQL 服务器问题,我一直在努力解决,但遇到困难,希望能得到一些帮助!

我有两个 table 数据,以不同的格式存储,我需要 bash 一起创建指定的输出。更糟糕的是,其中一个 tables 有一些关键数据存储在逗号分隔值中(我知道这不是数据应该存储的方式 - 怜悯,我没有设计这些 tables!).

学生 table:

| id |              oldSkill |                             newSkill |
+----+-----------------------+--------------------------------------+
|  1 |                  Word |                Excel,PowerPoint,Word |
|  2 | Excel,PowerPoint,Word |        Excel,Outlook,PowerPoint,Word |
|  3 |       PowerPoint,Word |                Excel,PowerPoint,Word |
|  4 |          Access,Excel | Access,Excel,Outlook,PowerPoint,Word |
|  5 |          Outlook,Word |        Excel,Outlook,PowerPoint,Word |

技能 table:

| id |      skill | assignment |
+----+------------+------------+
|  1 |       Word |          B |
|  1 |       Word |          P |
|  2 |      Excel |          P |
|  2 | PowerPoint |          B |
|  2 | PowerPoint |          P |
|  2 |       Word |          P |
|  3 | PowerPoint |          P |
|  3 |       Word |          P |
|  4 |     Access |          B |
|  4 |      Excel |          B |
|  4 |     Access |          P |
|  4 |      Excel |          P |
|  5 |    Outlook |          P |
|  5 |       Word |          B |

下面是我被要求输出的内容:

| id | skill_1 | skill_1_primary | skill_1_backup |    skill_2 | skill_2_primary | skill_2_backup |    skill_3 | skill_3_primary | skill_3_backup |    skill_4 | skill_4_primary | skill_4_backup | skill_5 | skill_5_primary | skill_5_backup |
|----|---------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|---------|-----------------|----------------|
|  1 |   Excel |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |               Y |              Y |     (null) |          (null) |         (null) |  (null) |          (null) |         (null) |
|  2 |   Excel |               Y |         (null) |    Outlook |               Y |         (null) | PowerPoint |               Y |              Y |       Word |               Y |         (null) |  (null) |          (null) |         (null) |
|  3 |   Excel |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |               Y |         (null) |     (null) |          (null) |         (null) |  (null) |          (null) |         (null) |
|  4 |  Access |               Y |              Y |      Excel |               Y |              Y |    Outlook |               Y |         (null) | PowerPoint |               Y |         (null) |    Word |               Y |         (null) |
|  5 |   Excel |               Y |         (null) |    Outlook |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |          (null) |              Y |  (null) |          (null) |         (null) |

要分解它,我需要:

我一直试图从不同的角度(CTE、枢轴、游标等)来看待这个问题,并且我已经成功地使用 UDF 将 CSV 列值拆分出来,但是从Skills table 的行并将其与 Student 数据一起组合成他们想要的格式,这让我很困惑。

我还设置了一个 SQL Fiddle 来为此构建我的测试数据 post: http://sqlfiddle.com/#!6/e8d5a/1/0

在此先感谢您的帮助或指导...SQL 不是我最强的技能之一。我可能可以用另一种语言更容易地做到这一点,但我被要求将其构建为存储过程。 =P

更新: 在评论中使用 posted 的建议,我在这方面已经取得了相当大的进步。我只需要最终输出方面的帮助。我认为这可以使用具有动态 sql 的数据透视表来完成,但是如何转换和聚合三个与技能相关的列并按照指定的方式对它们进行编号让我很困惑。

-- this pivots the skills table into a single row for each skill
select *
into #skillPiv
from 
(
  select id, skill, assignment,
    'assignment_'+cast(row_number() over(partition by id, skill order by skill) as varchar(10)) rn
  from skills
) d
pivot
(
  max(assignment)
  for rn in ([assignment_1], [assignment_2])
) piv
order by id;


-- this converts the student's oldSkills from CSV into rows and looks up the corresponding skill assignments in the #skills table
with st(id, skill, oldSkill) as (
select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1),
    STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '')
from students
union all
select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1),
    STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '')
from st
where oldSkill > ''
)
select st.id
    ,st.skill
    ,CASE WHEN sp.assignment_1 = 'P' OR sp.assignment_2 = 'P'
        THEN 'Y'
        ELSE ''
        END AS [primary]
    ,CASE WHEN sp.assignment_1 = 'B' OR sp.assignment_2 = 'B'
        THEN 'Y'
        ELSE ''
        END AS [backup]
into #oldSkills
from st
inner join #skillPiv sp on st.id = sp.id and st.skill = sp.skill
order by id;


-- convert the newSkills column from CSV to rows and insert our default skill assignment values
with tmp(id, skill, newSkill) as (
select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1),
    STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '')
from students
union all
select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1),
    STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '')
from tmp
where newSkill > ''
)
select id
    ,skill
    ,'Y' as [primary]
    ,'' as [backup]
into #newSkills
from tmp
where skill NOT IN (
    select skill from #oldSkills where id = tmp.id
    )
order by id;


-- now combine #oldSkills and #newSkills into one table that has all the values we need
select *
into #studentSkills
from (
    select * from #newSkills
    UNION
    select * from #oldSkills
) as ss;

select * from #studentSkills;

Example on RexTester

我在 tables 上运行 SQL Fiddle 时遇到问题,所以我将测试代码移至 RexTester。

在我的实际代码中,我使用 DelimitedSplit8K 从 Students table.

中解析出 CSV 值

上面的代码生成这个最终的 table:

| id |      skill | primary | backup |
|----|------------|---------|--------|
|  1 |      Excel |       Y | (null) |
|  1 | PowerPoint |       Y | (null) |
|  1 |       Word |       Y |      Y |
|  2 |      Excel |       Y | (null) |
|  2 |    Outlook |       Y | (null) |
|  2 | PowerPoint |       Y |      Y |
|  2 |       Word |       Y | (null) |
|  3 |      Excel |       Y | (null) |
|  3 | PowerPoint |       Y | (null) |
|  3 |       Word |       Y | (null) |
|  4 |     Access |       Y |      Y |
|  4 |      Excel |       Y |      Y |
|  4 |    Outlook |       Y | (null) |
|  4 | PowerPoint |       Y | (null) |
|  4 |       Word |       Y | (null) |
|  5 |      Excel |       Y | (null) |
|  5 |    Outlook |       Y | (null) |
|  5 | PowerPoint |       Y | (null) |
|  5 |       Word |  (null) |      Y |

现在我只需要旋转它以使其看起来像所需的输出:

| id | skill_1 | skill_1_primary | skill_1_backup |    skill_2 | skill_2_primary | skill_2_backup |    skill_3 | skill_3_primary | skill_3_backup |    skill_4 | skill_4_primary | skill_4_backup | skill_5 | skill_5_primary | skill_5_backup |
|----|---------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|---------|-----------------|----------------|
|  1 |   Excel |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |               Y |              Y |     (null) |          (null) |         (null) |  (null) |          (null) |         (null) |
|  2 |   Excel |               Y |         (null) |    Outlook |               Y |         (null) | PowerPoint |               Y |              Y |       Word |               Y |         (null) |  (null) |          (null) |         (null) |
|  3 |   Excel |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |               Y |         (null) |     (null) |          (null) |         (null) |  (null) |          (null) |         (null) |
|  4 |  Access |               Y |              Y |      Excel |               Y |              Y |    Outlook |               Y |         (null) | PowerPoint |               Y |         (null) |    Word |               Y |         (null) |
|  5 |   Excel |               Y |         (null) |    Outlook |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |          (null) |              Y |  (null) |          (null) |         (null) |

感谢您的帮助。谢谢!

这个设计真的,真的,真的糟透了:-D

不过,如果你必须坚持下去,你可以试试这个:

注意:我靠你的说法

Notice that the newSkill column includes the oldSkill values

我认为是“没有旧技能,不包含在新技能中!

解决方案是完全内联和基于集合的:

DECLARE @students TABLE(id INT,oldSkill VARCHAR(100),newSkill VARCHAR(100));
INSERT INTO @students VALUES
 (1,'Word','Excel,PowerPoint,Word')
,(2,'Excel,PowerPoint,Word','Excel,Outlook,PowerPoint,Word')
,(3,'PowerPoint,Word','Excel,PowerPoint,Word')
,(4,'Access,Excel','Access,Excel,Outlook,PowerPoint,Word')
,(5,'Outlook,Word','Excel,Outlook,PowerPoint,Word');

DECLARE @skills TABLE(id INT, skill VARCHAR(100),assignment VARCHAR(1));
INSERT INTO @skills VALUES
 (1,'Word','B')
,(1,'Word','P')
,(2,'Excel','P')
,(2,'PowerPoint','B')
,(2,'PowerPoint','P')
,(2,'Word','P')
,(3,'PowerPoint','P')
,(3,'Word','P')
,(4,'Access','B')
,(4,'Excel','B')
,(4,'Access','P')
,(4,'Excel','P')
,(5,'Outlook','P')
,(5,'Word','B');

--第一个 CTE 将使用 XML 技巧来拆分您的逗号分隔值

WITH Step1 AS
(
    SELECT id
          ,A.*     
    FROM @students AS s
    OUTER APPLY(
                 SELECT CAST('<x>' + REPLACE(s.oldSkill,',','</x><x>') + '</x>' AS XML) AS OldSkillXml
                       ,CAST('<x>' + REPLACE(s.newSkill,',','</x><x>') + '</x>' AS XML) AS NewSkillXml
                ) AS A
)

--第二次CTE得到的是老技能列表和flag

,OldSkills AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY Step1.id ORDER BY (SELECT NULL)) AS OldSkillOrder
          ,Step1.id
          ,os.value('text()[1]','varchar(100)') AS Skill
          ,CASE WHEN (SELECT assignment FROM @skills AS s WHERE s.id=Step1.id AND s.skill=os.value('text()[1]','varchar(100)') AND s.assignment='P') IS NOT NULL THEN 'Y' END AS IsPrimary
          ,CASE WHEN (SELECT assignment FROM @skills AS s WHERE s.id=Step1.id AND s.skill=os.value('text()[1]','varchar(100)') AND s.assignment='B') IS NOT NULL THEN 'Y' END AS IsBackup
    FROM Step1 
    OUTER APPLY Step1.OldSkillXml.nodes('x') AS A(os)
)

--此CTE获取新技能列表,全部标记为"IsPrimary='Y'"

,NewSkills AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY Step1.id ORDER BY (SELECT NULL)) AS NewSkillOrder
          ,Step1.id
          ,ns.value('text()[1]','varchar(100)') AS Skill
          ,'Y' AS IsPrimary
          ,NULL AS IsBackup
    FROM Step1 
    OUTER APPLY Step1.NewSkillXml.nodes('x') AS A(ns)
)

--中间列表是你在pivot之前的结果

,IntermediateList AS
(
    SELECT ns.id
          ,ns.Skill
          ,ns.IsPrimary
          ,os.IsBackup
          ,ns.NewSkillOrder
    FROM NewSkills AS ns
    FULL OUTER JOIN OldSkills AS os ON os.id=ns.id AND os.Skill=ns.Skill 
)

--我在这里使用 "conditional aggregation"( 老式 枢轴),这对于 PIVOT 多列 [=19] 非常有用=]

SELECT id

      ,MAX(CASE WHEN NewSkillOrder = 1 THEN Skill END) AS skill_1
      ,MAX(CASE WHEN NewSkillOrder = 1 THEN IsPrimary END) AS skill_1_primary
      ,MAX(CASE WHEN NewSkillOrder = 1 THEN IsBackup END) AS skill_1_backup

      ,MAX(CASE WHEN NewSkillOrder = 2 THEN Skill END) AS skill_2
      ,MAX(CASE WHEN NewSkillOrder = 2 THEN IsPrimary END) AS skill_2_primary
      ,MAX(CASE WHEN NewSkillOrder = 2 THEN IsBackup END) AS skill_2_backup

      ,MAX(CASE WHEN NewSkillOrder = 3 THEN Skill END) AS skill_3
      ,MAX(CASE WHEN NewSkillOrder = 3 THEN IsPrimary END) AS skill_3_primary
      ,MAX(CASE WHEN NewSkillOrder = 3 THEN IsBackup END) AS skill_3_backup

      ,MAX(CASE WHEN NewSkillOrder = 4 THEN Skill END) AS skill_4
      ,MAX(CASE WHEN NewSkillOrder = 4 THEN IsPrimary END) AS skill_4_primary
      ,MAX(CASE WHEN NewSkillOrder = 4 THEN IsBackup END) AS skill_4_backup

      ,MAX(CASE WHEN NewSkillOrder = 5 THEN Skill END) AS skill_5
      ,MAX(CASE WHEN NewSkillOrder = 5 THEN IsPrimary END) AS skill_5_primary
      ,MAX(CASE WHEN NewSkillOrder = 5 THEN IsBackup END) AS skill_5_backup
FROM IntermediateList AS il
GROUP BY id; 

结果

+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| id | skill_1 | skill_1_primary | skill_1_backup | skill_2    | skill_2_primary | skill_2_backup | skill_3    | skill_3_primary | skill_3_backup | skill_4    | skill_4_primary | skill_4_backup | skill_5 | skill_5_primary | skill_5_backup |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 1  | Excel   | Y               | NULL           | PowerPoint | Y               | NULL           | Word       | Y               | Y              | NULL       | NULL            | NULL           | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 2  | Excel   | Y               | NULL           | Outlook    | Y               | NULL           | PowerPoint | Y               | Y              | Word       | Y               | NULL           | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 3  | Excel   | Y               | NULL           | PowerPoint | Y               | NULL           | Word       | Y               | NULL           | NULL       | NULL            | NULL           | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 4  | Access  | Y               | Y              | Excel      | Y               | Y              | Outlook    | Y               | NULL           | PowerPoint | Y               | NULL           | Word    | Y               | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 5  | Excel   | Y               | NULL           | Outlook    | Y               | NULL           | PowerPoint | Y               | NULL           | Word       | Y               | Y              | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+

关注
有一个区别:你的学生 5 获得了 NULL/Y 技能 "Word" 我不明白,为什么这个技能,因为它包含在 "new skills" 不应该是 "primary".