Power Query 合并三个外部 Excel 源文件并追加特定列

Power Query combine three external Excel source files and append specific columns

我正在尝试创建一个查找 table 组合我的 3 个源文件主键列,这样我就不必执行外部连接来查找每个源中丢失的记录,然后将它们附加在一起。我找到了如何 "combine" 两个源文件,但我不知道如何深入 columns/fields 列表,以便我只能 select 第 1 列(或 "Item Code" header Excel 文件中的名称)。

这是我目前用于合并 2/3 文件的代码(作为试验):

let
Source = Table.Combine({Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_MRP_Dmd.xlsx"), null, true), 

Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_Open_PO.xlsx"), null, true)})

in Source

如果您刚开始使用 Power Query,请不要尝试手动编写代码,也不要将所有内容都塞进一个语句中。相反,使用功能区命令,然后根据需要编辑代码。

对于您的方案,您可以为每个数据源创建一个单独的查询。仅将这些加载为连接。调整每个数据源以包含您需要的列。然后您可以追加这三个数据查询并进一步细化结果。

如果您的数据源不太理想(即有很多不相关的列,您想要的数据中有重复项),那么避免具体化一大堆不必要数据的一种方法是执行所有 transformations/filtering 在嵌套的 table 单元格上,而不是加载所有数据只是为了删除 columns/dupes.

下面的 M 代码应该是一个粗略的开始,希望能帮助您上路

let
    //Adjust the Source step to refer to the relevant folder your 3 source files are saved in
    Source = Folder.Files("CC:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources"),

    //Filter the file list to leave just your 3 source files if required
    #"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),

    //Remove all columns excep the Binary file column
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content"}),

    //Convert the binary file to the file data ie sheets, tables, named ranges etc - the same data you get when you use a file as a source
    #"Workbook Data" = Table.TransformColumns(#"Removed Other Columns",{"Content", each Excel.Workbook(_)}), 

    //Filter the nested file data table cell to select the sheet you need from your source files - may not be necessary depending on what's in the files
    #"Sheet Filter" = Table.TransformColumns(#"Workbook Data",{"Content", each Table.SelectRows(_, each [Name] = "Sheet1")}),     

    //Step to Name the column you want to extract data from
    #"Column Name" = "Column1",

    //Extract a List of the values in the specified column
    #"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each List.Distinct(Table.Column(_{0}[Data],#"Column Name"))}), 

    //Expand all the lists
    #"Expanded Content" = Table.ExpandListColumn(#"Column Values", "Content"),

    #"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
    #"Removed Duplicates"

编辑 要 select 多列并提供不同的行,您可以更改从 #"Column Name"

开始的步骤

根据您拥有的数据量,这最终可能会比上一步花费更长的时间,但它应该可以完成工作

    //Step to Name the column you want to extract data from
    #"Column Name" = {"Column1","Column2","Column5"},

    //Extract a List of the values in the specified column
    #"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each Table.SelectColumns(_{0}[Data],#"Column Name")}),

    //In each nested table, filter down to distinct rows
    #"Distinct rows in Nested Tables" = Table.TransformColumns(#"Column Values",{"Content", each Table.Distinct(_)}),

    //Expand nested table column
    #"Expanded Content" = Table.ExpandTableColumn(#"Distinct rows in Nested Tables", "Content", #"Column Name"), 

    //Remove Duplicates in combined table
    #"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
    #"Removed Duplicates"