从原始数据创建 n 个新行，例如 (1000....1000+n)

Question

我需要从 Excel 工作簿中读取数据，其中数据以这种方式存储：

Company       Accounts
Company1      (#3000...#3999)
Company2      (#4000..4019)+(#4021..4024)

在 SSIS 中使用 OLE DB 目标的预期输出为：

Company       Accounts
Company1      3000
Company1      3001
Company1      3002
   .           .
   .           .
   .           .
Company1      3999
Company2      4000
Company2      4001
   .           .
   .           .
   .           .
Company2      4019
Company2      4021
   .           .
   .           .
Company2      4024

这让我很困惑，我什至不知道如何开始处理这个问题。

有人对此有任何见解吗？

Answer 1

假设工作表中有 2 个单元格，我想到的一般逻辑是将第二个单元格（每行）展开两次。第一遍使用 + 作为分隔符拆分字符串，并为每个公司返回一行或多行。使用 .. 作为分隔符重复该逻辑，但每行返回 2 列。这样，您可以循环或使用 table 个数字来生成所需的集合。如何最好地在 ssis 中做到这一点是我无法回答的问题，因为这不是一个经验领域。 numbers table 方法相对简单和常见。

Answer 2

首先，您必须将数据插入某个临时文件 table。 Here are several ways。然后运行这个查询：

with cte as (
select 
    company, replace(replace(replace(accounts,'(',''),')',''),'+','')+'#' accounts 
from 
    (values ('company 1','#3000#3999'),('company 2','(#4000#4019)+(#4021#4024)')) data(company, accounts)
)
, rcte as (
    select 
        company, stuff(accounts, ind1, ind2 - ind1, '') acc, substring(accounts, ind1 + 1, ind2 - ind1 - 1) accounts
    from 
        cte
        cross apply (select charindex('#', accounts) ind1) ca
        cross apply (select charindex('#', accounts, ind1 + 1) ind2) cb
    union all
    select
        company, stuff(acc, ind1, ind2 - ind1, ''), substring(acc, ind1 + 1, ind2 - ind1 - 1)
    from
        rcte
        cross apply (select charindex('#', acc) ind1) ca
        cross apply (select charindex('#', acc, ind1 + 1) ind2) cb
    where
        len(acc)>1
)

select company, accounts from rcte
order by company, accounts

option (maxrecursion 0)

Answer 3

首先，您需要一个拆分字符串函数，或者根据您的数据，您需要一个自定义拆分函数。我的示例使用此 dbo.DelimitedSplit8K

但正如我在分析来自 excel 的数据后所说，我可能会创建一个自定义 TVF。

其次，你必须有数字table，你可以创建一个你自己的逻辑。这是一次性创建和人口

CREATE TABLE tblnumber (number INT PRIMARY KEY)

INSERT INTO tblnumber
SELECT ROW_NUMBER() OVER (
        ORDER BY a.number
        )
FROM master..spt_values a
    ,master..spt_values b

这只是基于您当前数据集的概念。

您需要将所有 excel 数据拉入 Staging table。

create table #staging(Company varchar(50),Accounts varchar(50))
insert into #staging values 
('Company1',   '#3000...#3999')
,('Company2','#4000..4019)+(#4021..4024')

然后，

;with CTE as
(
select Company
,min(ca.Item) MinAcoount,max(ca.Item) MaxAcoount
from 
(
select Company
,replace(replace(replace(replace(Accounts,'#','') ,')',''),'(',''),'+','.')Accounts
from #staging
)tbl
cross apply(Select * from dbo.DelimitedSplit8K(Accounts,'.'))ca
where ca.Item<>''
group by Company
)
select c.Company,number as Account from tblnumber n
inner join cte c on n.number>=MinAcoount and n.number<=MaxAcoount

因为我使用CTE只是为了example.This只是为了理解。账户清理工作，请您了解。

Answer 4

您可以添加一个脚本组件来实现：

添加脚本组件
使输出缓冲区与输入异步。
并且对于每一行拆分，从帐户列中检索最小值和最大值。
然后使用 For 循环遍历检索到的最小值和最大值之间的值。
在循环内创建输出行

从原始数据创建 n 个新行，例如 (1000....1000+n)

Create n new rows from raw data such as (1000....1000+n)

tsql

sql-server

excel

ssis

etl