合并具有连接日期的行
Merge rows with connecting dates
我有大量 Excel sheet 客户和订阅数据。从这个 table 我想合并 records/rows 与 connection stop_ 和 start_dates 并在新作品 sheet 中显示结果。数据的简化版本如下所示。
Customer_id subscription_id start_date stop_date
1034 RV4 30-4-2012 30-1-2015
1035 AB7 30-1-2014 30-3-2014
1035 AB6 30-1-2014 30-3-2014
1035 AB7 30-12-2013 30-1-2014
1035 AB7 12-12-2012 30-12-2013
1035 AB7 12-9-2010 14-1-2011
因此,公式必须检查 customer_id 和 subscription_id。当sheet中有两行或多行匹配,其中一行的stop_date与另一行的start_date重叠,则提取合并后,新行必须显示第一行的 start_date 和另一行的 stop_date。如果有多行具有连接日期,这也必须起作用。不符合这些条件的所有行在提取后保持不变。所以结果会是这样的:
Customer_id subscription_id start_date stop_date
1034 RV4 30-4-2012 30-1-2015
1035 AB6 30-1-2014 30-3-2014
1035 AB7 12-12-2012 30-3-2014
1035 AB7 12-9-2010 14-1-2011
动态解决方案是理想的,同时新数据将添加到原始数据 sheet。虽然我知道当您确定要查找的行始终在彼此下方时这是可能的,但这里不是这种情况,它不会为您提供非常动态的解决方案。
所以我猜 Excel 中需要某种数组函数,但经过大量搜索后我找不到 suitable 解决方案。我也有 MATLAB 可用,但不知道从哪里开始该程序有这样的问题。
动态解决方案可能是可能的,但如果数据集很大,它可能会使事情陷入困境,因为你每次都需要它运行更改了一个单元格。
基本上我能看到的最好的方法是在你的 customer_id 和 subscription_id 中创建唯一的键,然后收集该键下的所有日期范围并合并它们。
类似这样的内容应该可以帮助您入门(需要参考 Microsoft Scripting Runtime):
Public Sub LinkSubscriptionDates()
Dim data As Dictionary, source As Worksheet, target As Worksheet
Set source = ActiveSheet
Set data = GetSubscriptions(source)
Set target = source.Parent.Worksheets.Add
'Copy headers
target.Range(target.Cells(1, 1), target.Cells(1, 4)).Value = _
source.Range(source.Cells(1, 1), source.Cells(1, 4)).Value
Dim row As Long
row = 2
Dim key As Variant, item As Variant
For Each key In data.Keys
For Each item In data(key)
target.Cells(row, 1) = Split(key, "|")(0)
target.Cells(row, 2) = Split(key, "|")(1)
target.Cells(row, 3) = Split(item, "|")(0)
target.Cells(row, 4) = Split(item, "|")(1)
row = row + 1
Next item
Next key
End Sub
Private Function GetSubscriptions(source As Worksheet) As Dictionary
Dim subscrips As Dictionary
Set subscrips = New Dictionary
Dim row As Long
Dim cust As String, subs As String, starting As String, ending As String
'Gather all the data as pairs of customer|subscription, starting|ending
For row = 2 To source.UsedRange.Rows.Count
Dim dates() As String
cust = source.Cells(row, 1).Value
subs = source.Cells(row, 2).Value
'Valid customer/subscription?
If cust <> vbNullString And subs <> vbNullString Then
starting = source.Cells(row, 3).Value
ending = source.Cells(row, 4).Value
'Has an ending and starting date?
If starting <> vbNullString And ending <> vbNullString Then
Dim key As String
key = cust & "|" & subs
'New combo?
If Not subscrips.Exists(key) Then
subscrips.Add key, New Collection
subscrips(key).Add starting & "|" & ending
Else
subscrips(key).Add starting & "|" & ending
Set subscrips(key) = MergeDates(subscrips(key))
End If
End If
End If
Next row
Set GetSubscriptions = subscrips
End Function
Private Function MergeDates(dates As Collection) As Collection
Dim candidate As Long, index As Long
Dim values() As String, test() As String
Dim merge As Boolean
For index = 1 To dates.Count
values = Split(dates(index), "|")
'Check to see if it can be merged with any other row.
For candidate = index + 1 To dates.Count
test = Split(dates(candidate), "|")
If CDate(test(0)) >= CDate(values(0)) And _
CDate(test(0)) <= CDate(values(1)) Or _
CDate(test(1)) >= CDate(values(0)) And _
CDate(test(1)) <= CDate(values(1)) Then
dates.Remove candidate
merge = True
Exit For
End If
Next candidate
If merge Then Exit For
Next index
If merge Then
'Pull both rows out of the collection.
dates.Remove index
values(0) = IIf(CDate(test(0)) < CDate(values(0)), _
CDate(test(0)), CDate(values(0)))
values(1) = IIf(CDate(test(1)) > CDate(values(1)), _
CDate(test(1)), CDate(values(1)))
'Put the merged date range back in.
dates.Add values(0) & "|" & values(1)
'Recurse.
Set MergeDates = MergeDates(dates)
End If
Set MergeDates = dates
End Function
它确实需要通过数据验证、错误捕获等来充实,目前它只是将结果数据放在新的工作表上。所有工作都在 GetSubscriptions
函数中完成,因此您只需从中获取返回的 Dictionary 并使用其中的数据做任何您需要做的事情。
我有大量 Excel sheet 客户和订阅数据。从这个 table 我想合并 records/rows 与 connection stop_ 和 start_dates 并在新作品 sheet 中显示结果。数据的简化版本如下所示。
Customer_id subscription_id start_date stop_date
1034 RV4 30-4-2012 30-1-2015
1035 AB7 30-1-2014 30-3-2014
1035 AB6 30-1-2014 30-3-2014
1035 AB7 30-12-2013 30-1-2014
1035 AB7 12-12-2012 30-12-2013
1035 AB7 12-9-2010 14-1-2011
因此,公式必须检查 customer_id 和 subscription_id。当sheet中有两行或多行匹配,其中一行的stop_date与另一行的start_date重叠,则提取合并后,新行必须显示第一行的 start_date 和另一行的 stop_date。如果有多行具有连接日期,这也必须起作用。不符合这些条件的所有行在提取后保持不变。所以结果会是这样的:
Customer_id subscription_id start_date stop_date
1034 RV4 30-4-2012 30-1-2015
1035 AB6 30-1-2014 30-3-2014
1035 AB7 12-12-2012 30-3-2014
1035 AB7 12-9-2010 14-1-2011
动态解决方案是理想的,同时新数据将添加到原始数据 sheet。虽然我知道当您确定要查找的行始终在彼此下方时这是可能的,但这里不是这种情况,它不会为您提供非常动态的解决方案。
所以我猜 Excel 中需要某种数组函数,但经过大量搜索后我找不到 suitable 解决方案。我也有 MATLAB 可用,但不知道从哪里开始该程序有这样的问题。
动态解决方案可能是可能的,但如果数据集很大,它可能会使事情陷入困境,因为你每次都需要它运行更改了一个单元格。
基本上我能看到的最好的方法是在你的 customer_id 和 subscription_id 中创建唯一的键,然后收集该键下的所有日期范围并合并它们。
类似这样的内容应该可以帮助您入门(需要参考 Microsoft Scripting Runtime):
Public Sub LinkSubscriptionDates()
Dim data As Dictionary, source As Worksheet, target As Worksheet
Set source = ActiveSheet
Set data = GetSubscriptions(source)
Set target = source.Parent.Worksheets.Add
'Copy headers
target.Range(target.Cells(1, 1), target.Cells(1, 4)).Value = _
source.Range(source.Cells(1, 1), source.Cells(1, 4)).Value
Dim row As Long
row = 2
Dim key As Variant, item As Variant
For Each key In data.Keys
For Each item In data(key)
target.Cells(row, 1) = Split(key, "|")(0)
target.Cells(row, 2) = Split(key, "|")(1)
target.Cells(row, 3) = Split(item, "|")(0)
target.Cells(row, 4) = Split(item, "|")(1)
row = row + 1
Next item
Next key
End Sub
Private Function GetSubscriptions(source As Worksheet) As Dictionary
Dim subscrips As Dictionary
Set subscrips = New Dictionary
Dim row As Long
Dim cust As String, subs As String, starting As String, ending As String
'Gather all the data as pairs of customer|subscription, starting|ending
For row = 2 To source.UsedRange.Rows.Count
Dim dates() As String
cust = source.Cells(row, 1).Value
subs = source.Cells(row, 2).Value
'Valid customer/subscription?
If cust <> vbNullString And subs <> vbNullString Then
starting = source.Cells(row, 3).Value
ending = source.Cells(row, 4).Value
'Has an ending and starting date?
If starting <> vbNullString And ending <> vbNullString Then
Dim key As String
key = cust & "|" & subs
'New combo?
If Not subscrips.Exists(key) Then
subscrips.Add key, New Collection
subscrips(key).Add starting & "|" & ending
Else
subscrips(key).Add starting & "|" & ending
Set subscrips(key) = MergeDates(subscrips(key))
End If
End If
End If
Next row
Set GetSubscriptions = subscrips
End Function
Private Function MergeDates(dates As Collection) As Collection
Dim candidate As Long, index As Long
Dim values() As String, test() As String
Dim merge As Boolean
For index = 1 To dates.Count
values = Split(dates(index), "|")
'Check to see if it can be merged with any other row.
For candidate = index + 1 To dates.Count
test = Split(dates(candidate), "|")
If CDate(test(0)) >= CDate(values(0)) And _
CDate(test(0)) <= CDate(values(1)) Or _
CDate(test(1)) >= CDate(values(0)) And _
CDate(test(1)) <= CDate(values(1)) Then
dates.Remove candidate
merge = True
Exit For
End If
Next candidate
If merge Then Exit For
Next index
If merge Then
'Pull both rows out of the collection.
dates.Remove index
values(0) = IIf(CDate(test(0)) < CDate(values(0)), _
CDate(test(0)), CDate(values(0)))
values(1) = IIf(CDate(test(1)) > CDate(values(1)), _
CDate(test(1)), CDate(values(1)))
'Put the merged date range back in.
dates.Add values(0) & "|" & values(1)
'Recurse.
Set MergeDates = MergeDates(dates)
End If
Set MergeDates = dates
End Function
它确实需要通过数据验证、错误捕获等来充实,目前它只是将结果数据放在新的工作表上。所有工作都在 GetSubscriptions
函数中完成,因此您只需从中获取返回的 Dictionary 并使用其中的数据做任何您需要做的事情。