Power Query 合并三个外部 Excel 源文件并追加特定列
Power Query combine three external Excel source files and append specific columns
我正在尝试创建一个查找 table 组合我的 3 个源文件主键列,这样我就不必执行外部连接来查找每个源中丢失的记录,然后将它们附加在一起。我找到了如何 "combine" 两个源文件,但我不知道如何深入 columns/fields 列表,以便我只能 select 第 1 列(或 "Item Code" header Excel 文件中的名称)。
这是我目前用于合并 2/3 文件的代码(作为试验):
let
Source = Table.Combine({Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_MRP_Dmd.xlsx"), null, true),
Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_Open_PO.xlsx"), null, true)})
in Source
如果您刚开始使用 Power Query,请不要尝试手动编写代码,也不要将所有内容都塞进一个语句中。相反,使用功能区命令,然后根据需要编辑代码。
对于您的方案,您可以为每个数据源创建一个单独的查询。仅将这些加载为连接。调整每个数据源以包含您需要的列。然后您可以追加这三个数据查询并进一步细化结果。
如果您的数据源不太理想(即有很多不相关的列,您想要的数据中有重复项),那么避免具体化一大堆不必要数据的一种方法是执行所有 transformations/filtering 在嵌套的 table 单元格上,而不是加载所有数据只是为了删除 columns/dupes.
下面的 M 代码应该是一个粗略的开始,希望能帮助您上路
let
//Adjust the Source step to refer to the relevant folder your 3 source files are saved in
Source = Folder.Files("CC:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources"),
//Filter the file list to leave just your 3 source files if required
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
//Remove all columns excep the Binary file column
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content"}),
//Convert the binary file to the file data ie sheets, tables, named ranges etc - the same data you get when you use a file as a source
#"Workbook Data" = Table.TransformColumns(#"Removed Other Columns",{"Content", each Excel.Workbook(_)}),
//Filter the nested file data table cell to select the sheet you need from your source files - may not be necessary depending on what's in the files
#"Sheet Filter" = Table.TransformColumns(#"Workbook Data",{"Content", each Table.SelectRows(_, each [Name] = "Sheet1")}),
//Step to Name the column you want to extract data from
#"Column Name" = "Column1",
//Extract a List of the values in the specified column
#"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each List.Distinct(Table.Column(_{0}[Data],#"Column Name"))}),
//Expand all the lists
#"Expanded Content" = Table.ExpandListColumn(#"Column Values", "Content"),
#"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
#"Removed Duplicates"
编辑
要 select 多列并提供不同的行,您可以更改从 #"Column Name"
开始的步骤
根据您拥有的数据量,这最终可能会比上一步花费更长的时间,但它应该可以完成工作
//Step to Name the column you want to extract data from
#"Column Name" = {"Column1","Column2","Column5"},
//Extract a List of the values in the specified column
#"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each Table.SelectColumns(_{0}[Data],#"Column Name")}),
//In each nested table, filter down to distinct rows
#"Distinct rows in Nested Tables" = Table.TransformColumns(#"Column Values",{"Content", each Table.Distinct(_)}),
//Expand nested table column
#"Expanded Content" = Table.ExpandTableColumn(#"Distinct rows in Nested Tables", "Content", #"Column Name"),
//Remove Duplicates in combined table
#"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
#"Removed Duplicates"
我正在尝试创建一个查找 table 组合我的 3 个源文件主键列,这样我就不必执行外部连接来查找每个源中丢失的记录,然后将它们附加在一起。我找到了如何 "combine" 两个源文件,但我不知道如何深入 columns/fields 列表,以便我只能 select 第 1 列(或 "Item Code" header Excel 文件中的名称)。
这是我目前用于合并 2/3 文件的代码(作为试验):
let
Source = Table.Combine({Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_MRP_Dmd.xlsx"), null, true),
Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_Open_PO.xlsx"), null, true)})
in Source
如果您刚开始使用 Power Query,请不要尝试手动编写代码,也不要将所有内容都塞进一个语句中。相反,使用功能区命令,然后根据需要编辑代码。
对于您的方案,您可以为每个数据源创建一个单独的查询。仅将这些加载为连接。调整每个数据源以包含您需要的列。然后您可以追加这三个数据查询并进一步细化结果。
如果您的数据源不太理想(即有很多不相关的列,您想要的数据中有重复项),那么避免具体化一大堆不必要数据的一种方法是执行所有 transformations/filtering 在嵌套的 table 单元格上,而不是加载所有数据只是为了删除 columns/dupes.
下面的 M 代码应该是一个粗略的开始,希望能帮助您上路
let
//Adjust the Source step to refer to the relevant folder your 3 source files are saved in
Source = Folder.Files("CC:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources"),
//Filter the file list to leave just your 3 source files if required
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
//Remove all columns excep the Binary file column
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content"}),
//Convert the binary file to the file data ie sheets, tables, named ranges etc - the same data you get when you use a file as a source
#"Workbook Data" = Table.TransformColumns(#"Removed Other Columns",{"Content", each Excel.Workbook(_)}),
//Filter the nested file data table cell to select the sheet you need from your source files - may not be necessary depending on what's in the files
#"Sheet Filter" = Table.TransformColumns(#"Workbook Data",{"Content", each Table.SelectRows(_, each [Name] = "Sheet1")}),
//Step to Name the column you want to extract data from
#"Column Name" = "Column1",
//Extract a List of the values in the specified column
#"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each List.Distinct(Table.Column(_{0}[Data],#"Column Name"))}),
//Expand all the lists
#"Expanded Content" = Table.ExpandListColumn(#"Column Values", "Content"),
#"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
#"Removed Duplicates"
编辑
要 select 多列并提供不同的行,您可以更改从 #"Column Name"
根据您拥有的数据量,这最终可能会比上一步花费更长的时间,但它应该可以完成工作
//Step to Name the column you want to extract data from
#"Column Name" = {"Column1","Column2","Column5"},
//Extract a List of the values in the specified column
#"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each Table.SelectColumns(_{0}[Data],#"Column Name")}),
//In each nested table, filter down to distinct rows
#"Distinct rows in Nested Tables" = Table.TransformColumns(#"Column Values",{"Content", each Table.Distinct(_)}),
//Expand nested table column
#"Expanded Content" = Table.ExpandTableColumn(#"Distinct rows in Nested Tables", "Content", #"Column Name"),
//Remove Duplicates in combined table
#"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
#"Removed Duplicates"