Excel Web Powerquery:Excel 合并单元格中的数据字符串 --> 如何分隔数据?
Excel Web Powerquery: Excel merges data strings in cells --> How do I delimit the data?
我正在使用 Excel 2016 并想通过 Web Powerquery 函数从 Oddschecker.com 下载 Odds 到 Excel 电子表格中。
更具体地说,我正在尝试从该网站下载数据:
我遇到的问题是,本网站上的一些赔率在没有 space 的情况下被合并到单个单元格中:
Powerquery 中是否有任何方法来分隔数据 strings/odds 以便它们不被合并?
非常感谢您提供的任何帮助。
虽然我无法测试,但由于该站点在俄罗斯互联网段中被列入黑名单,我想那里有 <cr>
或 <lf>
,并且它们没有转换为新的线。
您需要的是 运行 Text.Replace
对所有包含数据的单元格替换这些字符。
但随后您可能需要将这些值作为单独的行,而这是一项复杂得多的任务。 :)
灵感来自 Gil Raviv 的 http://datachant.com/2017/03/30/web-scraping-power-bi-excel-power-query/
编辑 2017 年 4 月 11 日:此解决方案高度依赖于网站的结构,或者换句话说:昨天它运行良好,但不幸的是今天却没有。
以下具有关联功能的查询适用于我:
let
Source = Web.Page(Web.Contents("https://www.oddschecker.com/politics/european-politics/french-election/next-president/bet-history/marine-le-pen/today#all-history")),
Data0 = Source{1}[Data],
Children = Data0{0}[Children],
Children1 = Children{1}[Children],
Children2 = Children1{4}[Children],
Children3 = Children2{0}[Children],
Children4 = Children3{0}[Children],
Children5 = Children4{0}[Children],
Children6 = Children5{3}[Children],
Children7 = Children6{0}[Children],
Children8 = Children7{1}[Children],
Children9 = Children8{3}[Children],
Children10 = Children9{0}[Children],
Children11 = Children10{2}[Children],
Children12 = Children11{2}[Children],
Children13 = Children12{0}[Children],
Children14 = Children13{1}[Children],
#"Removed Other Columns" = Table.SelectColumns(Children14,{"Children"}),
#"Invoked Custom Function" = Table.AddColumn(#"Removed Other Columns", "EpandTables", each EpandTables([Children])),
#"Expanded EpandTables" = Table.ExpandTableColumn(#"Invoked Custom Function", "EpandTables", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded EpandTables",{"Children"}),
#"Removed Blank Rows" = Table.SelectRows(#"Removed Columns", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
#"Parsed Date" = Table.TransformColumns(#"Removed Blank Rows",{{"Column1", each Date.From(DateTimeZone.From(_)), type date}})
in
#"Parsed Date"
函数 ExpandTables (编辑:#"Added Custom" 行通过添加 Table.SelectRows 调整)
(ChildTable as table) =>
let
#"Removed Other Columns1" = Table.SelectColumns(ChildTable,{"Children"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns1", "Custom", each try if [Children] is null then null else if [Children][Text]{0} <> null then [Children][Text]{0} else Lines.ToText(List.Transform(Table.SelectRows([Children], each [Children] <> null)[Children], each _[Text]{0})) otherwise null),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Children"}),
#"Transposed Table" = Table.Transpose(#"Removed Columns")
in
#"Transposed Table"
下面代码中的另一种方法使用递归函数 fnSearchTR(嵌入在查询中)向下钻取 HTML 文档,直到找到名称 "TR"(或在 100 之后)迭代只是为了防止无休止的迭代)。我注意到这是所需数据所在的地方,至少在今天是这样。
备注:我也将代码中的第二步调整为select "Document"。
这是一个更动态的解决方案,因为 "TR" 在文档结构中的位置无关紧要;否则,如果调整文档结构,那么仍然有可能首先找到其他"TR",但到目前为止它是有效的。
否则也会找到 "TR" 的其他内容,但在将第一列的数据类型调整为日期后,这些将作为错误或空值过滤掉。
此查询还使用了我之前回答中的函数 "ExpandTables"(我更正了拼写错误并添加了 "x",否则函数没有任何变化)。
let
Source = Web.Page(Web.Contents("https://www.oddschecker.com/politics/european-politics/french-election/next-president/bet-history/marine-le-pen/today#all-history")),
Data0 = Table.SelectRows(Source, each [Caption] = "Document"){0}[Data],
ChildrenWithTable = Table.SelectRows(Data0, each [Children] is table),
fnSearchTR = (newChildren as table, counter as number) as table =>
let
Combined = Table.Buffer(Table.Combine(newChildren[Children])),
ChildrensChildrenWithTable = Table.AddColumn(newChildren, "ChildrensChildren", each Table.SelectRows([Children], each [Children] is table)),
ChildrensChildrenCombined = Table.Combine(ChildrensChildrenWithTable[ChildrensChildren]),
CombinedAll = if ChildrensChildrenCombined[Name]{0} = "TR"
then ChildrensChildrenCombined
else if Table.RowCount(ChildrensChildrenCombined) = 0 or counter = 100
then Combined
else @fnSearchTR(ChildrensChildrenCombined, counter + 1)
in
CombinedAll,
CombinedAll = if Table.RowCount(ChildrenWithTable) = 0 then Data0 else fnSearchTR(ChildrenWithTable, 0),
#"Filtered Rows" = Table.SelectRows(CombinedAll, each ([Name] = "TR")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Children"}),
#"Invoked Custom Function" = Table.AddColumn(#"Removed Other Columns", "ExpandTables", each ExpandTables([Children])),
#"Removed Columns" = Table.RemoveColumns(#"Invoked Custom Function",{"Children"}),
#"Expanded ExpandTables" = Table.ExpandTableColumn(#"Removed Columns", "ExpandTables", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded ExpandTables",{{"Column1", type date}}),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type", {"Column1"}),
#"Filtered Rows1" = Table.SelectRows(#"Removed Errors", each ([Column1] <> null))
in
#"Filtered Rows1"
问题是其中一个组合单元格的 HTML 是:
<td><div class="oo">11/4</div><div class="oi">13/5</div><div class="oo">11/4</div></td>
据我所知,div
布局规则并不意味着换行符,因此 Power Query 不会插入换行符。我们没有 运行 完整的布局引擎,所以我们不知道列宽意味着每个 div
应该在自己的行上。
(如果有人对 HTML 布局语义了解更多,请告诉我,我可以向我的团队提出修复建议。)
您可以像这样 text-replace HTML 在 div
元素之间插入您自己的分隔符 ;
let
WebPageWithReplace = (url as text, old as text, new as text) =>
let
Source = Web.Contents(url),
TextReplace = Text.ToBinary(Text.Replace(Text.FromBinary(Source), old, new)),
Page = Web.Page(TextReplace)
in
Page,
Invoked = WebPageWithReplace(
"https://www.oddschecker.com/politics/european-politics/french-election/next-president/bet-history/marine-le-pen/today#all-history",
"</div><div",
"</div>;<div"),
Data = Invoked{1}[Data]
in
Data
这样 Web.Page
仍然会找到并解析 HTML table。
我正在使用 Excel 2016 并想通过 Web Powerquery 函数从 Oddschecker.com 下载 Odds 到 Excel 电子表格中。
更具体地说,我正在尝试从该网站下载数据:
我遇到的问题是,本网站上的一些赔率在没有 space 的情况下被合并到单个单元格中:
Powerquery 中是否有任何方法来分隔数据 strings/odds 以便它们不被合并?
非常感谢您提供的任何帮助。
虽然我无法测试,但由于该站点在俄罗斯互联网段中被列入黑名单,我想那里有 <cr>
或 <lf>
,并且它们没有转换为新的线。
您需要的是 运行 Text.Replace
对所有包含数据的单元格替换这些字符。
但随后您可能需要将这些值作为单独的行,而这是一项复杂得多的任务。 :)
灵感来自 Gil Raviv 的 http://datachant.com/2017/03/30/web-scraping-power-bi-excel-power-query/
编辑 2017 年 4 月 11 日:此解决方案高度依赖于网站的结构,或者换句话说:昨天它运行良好,但不幸的是今天却没有。
以下具有关联功能的查询适用于我:
let
Source = Web.Page(Web.Contents("https://www.oddschecker.com/politics/european-politics/french-election/next-president/bet-history/marine-le-pen/today#all-history")),
Data0 = Source{1}[Data],
Children = Data0{0}[Children],
Children1 = Children{1}[Children],
Children2 = Children1{4}[Children],
Children3 = Children2{0}[Children],
Children4 = Children3{0}[Children],
Children5 = Children4{0}[Children],
Children6 = Children5{3}[Children],
Children7 = Children6{0}[Children],
Children8 = Children7{1}[Children],
Children9 = Children8{3}[Children],
Children10 = Children9{0}[Children],
Children11 = Children10{2}[Children],
Children12 = Children11{2}[Children],
Children13 = Children12{0}[Children],
Children14 = Children13{1}[Children],
#"Removed Other Columns" = Table.SelectColumns(Children14,{"Children"}),
#"Invoked Custom Function" = Table.AddColumn(#"Removed Other Columns", "EpandTables", each EpandTables([Children])),
#"Expanded EpandTables" = Table.ExpandTableColumn(#"Invoked Custom Function", "EpandTables", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded EpandTables",{"Children"}),
#"Removed Blank Rows" = Table.SelectRows(#"Removed Columns", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
#"Parsed Date" = Table.TransformColumns(#"Removed Blank Rows",{{"Column1", each Date.From(DateTimeZone.From(_)), type date}})
in
#"Parsed Date"
函数 ExpandTables (编辑:#"Added Custom" 行通过添加 Table.SelectRows 调整)
(ChildTable as table) =>
let
#"Removed Other Columns1" = Table.SelectColumns(ChildTable,{"Children"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns1", "Custom", each try if [Children] is null then null else if [Children][Text]{0} <> null then [Children][Text]{0} else Lines.ToText(List.Transform(Table.SelectRows([Children], each [Children] <> null)[Children], each _[Text]{0})) otherwise null),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Children"}),
#"Transposed Table" = Table.Transpose(#"Removed Columns")
in
#"Transposed Table"
下面代码中的另一种方法使用递归函数 fnSearchTR(嵌入在查询中)向下钻取 HTML 文档,直到找到名称 "TR"(或在 100 之后)迭代只是为了防止无休止的迭代)。我注意到这是所需数据所在的地方,至少在今天是这样。 备注:我也将代码中的第二步调整为select "Document"。
这是一个更动态的解决方案,因为 "TR" 在文档结构中的位置无关紧要;否则,如果调整文档结构,那么仍然有可能首先找到其他"TR",但到目前为止它是有效的。 否则也会找到 "TR" 的其他内容,但在将第一列的数据类型调整为日期后,这些将作为错误或空值过滤掉。
此查询还使用了我之前回答中的函数 "ExpandTables"(我更正了拼写错误并添加了 "x",否则函数没有任何变化)。
let
Source = Web.Page(Web.Contents("https://www.oddschecker.com/politics/european-politics/french-election/next-president/bet-history/marine-le-pen/today#all-history")),
Data0 = Table.SelectRows(Source, each [Caption] = "Document"){0}[Data],
ChildrenWithTable = Table.SelectRows(Data0, each [Children] is table),
fnSearchTR = (newChildren as table, counter as number) as table =>
let
Combined = Table.Buffer(Table.Combine(newChildren[Children])),
ChildrensChildrenWithTable = Table.AddColumn(newChildren, "ChildrensChildren", each Table.SelectRows([Children], each [Children] is table)),
ChildrensChildrenCombined = Table.Combine(ChildrensChildrenWithTable[ChildrensChildren]),
CombinedAll = if ChildrensChildrenCombined[Name]{0} = "TR"
then ChildrensChildrenCombined
else if Table.RowCount(ChildrensChildrenCombined) = 0 or counter = 100
then Combined
else @fnSearchTR(ChildrensChildrenCombined, counter + 1)
in
CombinedAll,
CombinedAll = if Table.RowCount(ChildrenWithTable) = 0 then Data0 else fnSearchTR(ChildrenWithTable, 0),
#"Filtered Rows" = Table.SelectRows(CombinedAll, each ([Name] = "TR")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Children"}),
#"Invoked Custom Function" = Table.AddColumn(#"Removed Other Columns", "ExpandTables", each ExpandTables([Children])),
#"Removed Columns" = Table.RemoveColumns(#"Invoked Custom Function",{"Children"}),
#"Expanded ExpandTables" = Table.ExpandTableColumn(#"Removed Columns", "ExpandTables", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded ExpandTables",{{"Column1", type date}}),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type", {"Column1"}),
#"Filtered Rows1" = Table.SelectRows(#"Removed Errors", each ([Column1] <> null))
in
#"Filtered Rows1"
问题是其中一个组合单元格的 HTML 是:
<td><div class="oo">11/4</div><div class="oi">13/5</div><div class="oo">11/4</div></td>
据我所知,div
布局规则并不意味着换行符,因此 Power Query 不会插入换行符。我们没有 运行 完整的布局引擎,所以我们不知道列宽意味着每个 div
应该在自己的行上。
(如果有人对 HTML 布局语义了解更多,请告诉我,我可以向我的团队提出修复建议。)
您可以像这样 text-replace HTML 在 div
元素之间插入您自己的分隔符 ;
let
WebPageWithReplace = (url as text, old as text, new as text) =>
let
Source = Web.Contents(url),
TextReplace = Text.ToBinary(Text.Replace(Text.FromBinary(Source), old, new)),
Page = Web.Page(TextReplace)
in
Page,
Invoked = WebPageWithReplace(
"https://www.oddschecker.com/politics/european-politics/french-election/next-president/bet-history/marine-le-pen/today#all-history",
"</div><div",
"</div>;<div"),
Data = Invoked{1}[Data]
in
Data
这样 Web.Page
仍然会找到并解析 HTML table。