电源查询。合并具有折叠值的行中的重复行
Power Query. Merge duplicated lines in a row with collapsing values
我有一个公交时刻表,其中有停靠站 Time_in 和 Time_out。有时在我的数据中,停靠点重复(连续),我需要合并它们,只留下第一个 Time_in 和最后一个 Time_out.
下面是一个例子:
停止
Time_in
Time_out
23 街
15:23
15:27
42 街
15:35
15:40
42 街
15:42
15:48
47 街
15:56
16:10
42 街
16:14
16:19
想要的结果:
停止
Time_in
Time_out
23 街
15:23
15:27
42 街
15:35
15:48
47 街
15:56
16:10
42 街
16:14
16:19
非常感谢任何帮助,在此先感谢。
Power Query
let
Source = Web.BrowserContents("
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(1)"}, {"Column2", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(2)"}, {"Column3", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(3)"}}, [RowSelector="DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR"]),
#"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Stop", type text}, {"Time_in", type time}, {"Time_out", type time}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Time_out"}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Stop"}, {{"ad_1", each _, type table [Stop=nullable text, Time_in=nullable time]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each let x= [ad_1],
#"Removed Columns1" = Table.RemoveColumns(x,{"Stop"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns1",{{"Time_in", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Index] = 1)),
#"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns2"),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"ad_1"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns1", "Custom", {"Time_in"}, {"Time_in"}),
Custom1 = Table.RemoveColumns(#"Changed Type",{"Time_in"}),
#"Grouped Rows1" = Table.Group(Custom1, {"Stop"}, {{"ad_2", each _, type table [Stop=nullable text, Time_out=nullable time]}}),
Custom2 = Table.AddColumn(#"Grouped Rows1", "Custom", each let x= [ad_2],
#"Removed Columns1" = Table.RemoveColumns(x,{"Stop"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns1",{{"Time_out", Order.Descending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Index] = 1)),
#"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns2"),
#"Removed Columns2" = Table.RemoveColumns(Custom2,{"ad_2"}),
#"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom", {"Time_out"}, {"Time_out"}),
#"Merged Queries" = Table.NestedJoin(#"Expanded Custom", {"Stop"}, #"Expanded Custom1", {"Stop"}, "Expanded Custom1", JoinKind.LeftOuter),
#"Expanded Expanded Custom1" = Table.ExpandTableColumn(#"Merged Queries", "Expanded Custom1", {"Time_out"}, {"Time_out"})
in
#"Expanded Expanded Custom1"
DAX
min:= MIN('Table 1'[Time_in])
max:= MAX('Table 1'[Time_out])
DAX 结果
在 powerquery 中,右键单击“停止”列,然后单击“分组依据...”。
选择添加分组
对于第 Time_in 列的第一行选择操作最小值
对于第二行,选择第 Time_out
列的操作最大值
如果需要,将类型数字更改为在编辑栏或主页中输入时间...高级编辑器..
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Stop", type text}, {"Time_in", type time}, {"Time_out", type time}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Stop"}, {{"Time_in", each List.Min([Time_in]), type time}, {"Time_out", each List.Max([Time_out]), type time}})
in #"Grouped Rows"
对于 Stops 可以重复的新要求,我们首先创建一个组号,以确保 Stops 在组合之前位于相邻的行中
添加列索引列
添加列,使用公式自定义列
= try if #"Added Index"{[Index]}[Stop] = #"Added Index"{[Index]-1}[Stop] then null else [Index] otherwise [Index]
右键单击新列并向下填写
同时单击“停止”和“自定义”列并对其进行分组
选择添加聚合
对于第 Time_in 列的第一行选择操作最小值
对于第二行,选择第 Time_out 列上的操作最大值。
示例代码:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Stop", type text}, {"Time_in", type time}, {"Time_out", type time}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try if #"Added Index"{[Index]}[Stop] = #"Added Index"{[Index]-1}[Stop] then null else [Index] otherwise [Index]),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
#"Grouped Rows" = Table.Group(#"Filled Down", {"Stop", "Custom"}, {{"Time_in", each List.Min([Time_in]), type time}, {"Time_out", each List.Max([Time_out]), type time}}),
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Custom"})
in #"Removed Columns"
我有一个公交时刻表,其中有停靠站 Time_in 和 Time_out。有时在我的数据中,停靠点重复(连续),我需要合并它们,只留下第一个 Time_in 和最后一个 Time_out.
下面是一个例子:
停止 | Time_in | Time_out |
---|---|---|
23 街 | 15:23 | 15:27 |
42 街 | 15:35 | 15:40 |
42 街 | 15:42 | 15:48 |
47 街 | 15:56 | 16:10 |
42 街 | 16:14 | 16:19 |
想要的结果:
停止 | Time_in | Time_out |
---|---|---|
23 街 | 15:23 | 15:27 |
42 街 | 15:35 | 15:48 |
47 街 | 15:56 | 16:10 |
42 街 | 16:14 | 16:19 |
非常感谢任何帮助,在此先感谢。
Power Query
let
Source = Web.BrowserContents("
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(1)"}, {"Column2", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(2)"}, {"Column3", "DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR > :nth-child(3)"}}, [RowSelector="DIV.s-table-container:nth-child(3) > TABLE.s-table > * > TR"]),
#"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Stop", type text}, {"Time_in", type time}, {"Time_out", type time}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Time_out"}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Stop"}, {{"ad_1", each _, type table [Stop=nullable text, Time_in=nullable time]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each let x= [ad_1],
#"Removed Columns1" = Table.RemoveColumns(x,{"Stop"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns1",{{"Time_in", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Index] = 1)),
#"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns2"),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"ad_1"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns1", "Custom", {"Time_in"}, {"Time_in"}),
Custom1 = Table.RemoveColumns(#"Changed Type",{"Time_in"}),
#"Grouped Rows1" = Table.Group(Custom1, {"Stop"}, {{"ad_2", each _, type table [Stop=nullable text, Time_out=nullable time]}}),
Custom2 = Table.AddColumn(#"Grouped Rows1", "Custom", each let x= [ad_2],
#"Removed Columns1" = Table.RemoveColumns(x,{"Stop"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns1",{{"Time_out", Order.Descending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Index] = 1)),
#"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns2"),
#"Removed Columns2" = Table.RemoveColumns(Custom2,{"ad_2"}),
#"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom", {"Time_out"}, {"Time_out"}),
#"Merged Queries" = Table.NestedJoin(#"Expanded Custom", {"Stop"}, #"Expanded Custom1", {"Stop"}, "Expanded Custom1", JoinKind.LeftOuter),
#"Expanded Expanded Custom1" = Table.ExpandTableColumn(#"Merged Queries", "Expanded Custom1", {"Time_out"}, {"Time_out"})
in
#"Expanded Expanded Custom1"
DAX
min:= MIN('Table 1'[Time_in])
max:= MAX('Table 1'[Time_out])
DAX 结果
在 powerquery 中,右键单击“停止”列,然后单击“分组依据...”。
选择添加分组
对于第 Time_in 列的第一行选择操作最小值
对于第二行,选择第 Time_out
列的操作最大值如果需要,将类型数字更改为在编辑栏或主页中输入时间...高级编辑器..
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Stop", type text}, {"Time_in", type time}, {"Time_out", type time}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Stop"}, {{"Time_in", each List.Min([Time_in]), type time}, {"Time_out", each List.Max([Time_out]), type time}})
in #"Grouped Rows"
对于 Stops 可以重复的新要求,我们首先创建一个组号,以确保 Stops 在组合之前位于相邻的行中
添加列索引列
添加列,使用公式自定义列
= try if #"Added Index"{[Index]}[Stop] = #"Added Index"{[Index]-1}[Stop] then null else [Index] otherwise [Index]
右键单击新列并向下填写
同时单击“停止”和“自定义”列并对其进行分组
选择添加聚合
对于第 Time_in 列的第一行选择操作最小值
对于第二行,选择第 Time_out 列上的操作最大值。
示例代码:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Stop", type text}, {"Time_in", type time}, {"Time_out", type time}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try if #"Added Index"{[Index]}[Stop] = #"Added Index"{[Index]-1}[Stop] then null else [Index] otherwise [Index]),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
#"Grouped Rows" = Table.Group(#"Filled Down", {"Stop", "Custom"}, {{"Time_in", each List.Min([Time_in]), type time}, {"Time_out", each List.Max([Time_out]), type time}}),
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Custom"})
in #"Removed Columns"