如果某些字段为空且具有来自不同列的相关值,则更新 table
Update table if certain fields are null with related values from a different column
我正在 kettle pentaho 中编写 ETL,以从包括 google 分析在内的各种来源创建 table。
so Table 1 = 来自网站的所有数据加入到 google 分析信息
Table 2 = 所有来自 Table 1 的重复数据加入到 google 分析信息
我的问题是 table 1 上的某些信息缺少 google 分析信息,但 table 2 显示了 Google 分析的一些数据 reference_number
所以我想做的是从 table 1 到 table 2 查找 [reference_number] 并填充 table 1,其中某些列从信息中为空table2
快速示例编辑*
Table 1 (Main Table) * *This table has an index built in on website_reference number (Unique)*
website_Reference_number GA_info_1 GA_info_2
A1 null null
A2 x y
Table 2 (Duplicates from Table 1)
eventlabel GA_info_1 GA_info_2
A1 z z
A2 x y
我的输出应该是下面的
Table 1 (Main Table)
Ref_number GA_info_1 GA_info_2
A1 z z
A2 x y
我正在使用 My_SQL 数据库
UPDATE mytable
LEFT JOIN table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
mytable.GA_info_1,
table2.GA_info_1
),
mytable.GA_info_2 = COALESCE (
mytable.GA_info_2,
table2.GA_info_2
)
WHERE
mytable.GA_info_1 IS NULL
OR mytable.GA_info_2 IS NULL
将所有可能为空的字段放入where子句中。
如果该字段不为空,则不会更新,因为它是 coalesce
函数中的第一个参数,如果为空,它将由另一个 table 的字段更新.
编辑:你也可以这样试:
UPDATE mytable
INNER JOIN table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
mytable.GA_info_1,
table2.GA_info_1
),
mytable.GA_info_2 = COALESCE (
mytable.GA_info_2,
table2.GA_info_2
)
WHERE
CONCAT(mytable.GA_info_1, mytable.GA_info_2) IS NULL
对于性能问题:(评论中已经提到)
由于您没有使用主键或外键来连接 table,因此您必须在 Ref_number 列上设置索引以加快连接速度。
UPDATE DIM_ENQUIRIES_TEST
LEFT JOIN DIM_ENQUIRIES_TEST AS STAGING_GA ON DIM_ENQUIRIES_TEST.website_reference_number = STAGING_GA.eventlabel
SET DIM_ENQUIRIES_TEST.eventlabel = COALESCE (
DIM_ENQUIRIES_TEST.eventlabel,
STAGING_GA.eventlabel
),
DIM_ENQUIRIES_TEST.sourcemedium = COALESCE (
DIM_ENQUIRIES_TEST.sourcemedium,
STAGING_GA.sourcemedium
)
,
DIM_ENQUIRIES_TEST.deviceCategory = COALESCE (
DIM_ENQUIRIES_TEST.deviceCategory,
STAGING_GA.deviceCategory
)
,
DIM_ENQUIRIES_TEST.avgSessionDuration = COALESCE (
DIM_ENQUIRIES_TEST.avgSessionDuration,
STAGING_GA.avgSessionDuration
)
,
DIM_ENQUIRIES_TEST.timeonpage = COALESCE (
DIM_ENQUIRIES_TEST.timeonpage,
STAGING_GA.timeonpage
)
,
DIM_ENQUIRIES_TEST.avgtimeonpage = COALESCE (
DIM_ENQUIRIES_TEST.avgtimeonpage,
STAGING_GA.avgtimeonpage
)
,
DIM_ENQUIRIES_TEST.bouncerate = COALESCE (
DIM_ENQUIRIES_TEST.bouncerate,
STAGING_GA.bouncerate
)
,
DIM_ENQUIRIES_TEST.profileid = COALESCE (
DIM_ENQUIRIES_TEST.profileid,
STAGING_GA.profileid
)
,
DIM_ENQUIRIES_TEST.webpropertyid = COALESCE (
DIM_ENQUIRIES_TEST.webpropertyid,
STAGING_GA.webpropertyid
)
,
DIM_ENQUIRIES_TEST.accountname = COALESCE (
DIM_ENQUIRIES_TEST.accountname,
STAGING_GA.accountname
)
,
DIM_ENQUIRIES_TEST.tableid = COALESCE (
DIM_ENQUIRIES_TEST.tableid,
STAGING_GA.tableid
)
,
DIM_ENQUIRIES_TEST.tablename = COALESCE (
DIM_ENQUIRIES_TEST.tablename,
STAGING_GA.tablename
)
,
DIM_ENQUIRIES_TEST.keyword = COALESCE (
DIM_ENQUIRIES_TEST.keyword,
STAGING_GA.keyword
)
,
DIM_ENQUIRIES_TEST.country = COALESCE (
DIM_ENQUIRIES_TEST.country,
STAGING_GA.country
)
,
DIM_ENQUIRIES_TEST.campaign = COALESCE (
DIM_ENQUIRIES_TEST.campaign,
STAGING_GA.campaign
)
,
DIM_ENQUIRIES_TEST.sessions = COALESCE (
DIM_ENQUIRIES_TEST.sessions,
STAGING_GA.sessions
)
,
DIM_ENQUIRIES_TEST.sessionduration = COALESCE (
DIM_ENQUIRIES_TEST.sessionduration,
STAGING_GA.sessionduration
)
,
DIM_ENQUIRIES_TEST.bounces = COALESCE (
DIM_ENQUIRIES_TEST.bounces,
STAGING_GA.bounces
)
WHERE
DIM_ENQUIRIES_TEST.EventLabel IS NULL
OR DIM_ENQUIRIES_TEST.SourceMedium IS NULL
;
--我只检查一个,因为如果其中一个为空,则可能需要更改的其余列也为空
我正在 kettle pentaho 中编写 ETL,以从包括 google 分析在内的各种来源创建 table。
so Table 1 = 来自网站的所有数据加入到 google 分析信息 Table 2 = 所有来自 Table 1 的重复数据加入到 google 分析信息
我的问题是 table 1 上的某些信息缺少 google 分析信息,但 table 2 显示了 Google 分析的一些数据 reference_number
所以我想做的是从 table 1 到 table 2 查找 [reference_number] 并填充 table 1,其中某些列从信息中为空table2
快速示例编辑*
Table 1 (Main Table) * *This table has an index built in on website_reference number (Unique)*
website_Reference_number GA_info_1 GA_info_2
A1 null null
A2 x y
Table 2 (Duplicates from Table 1)
eventlabel GA_info_1 GA_info_2
A1 z z
A2 x y
我的输出应该是下面的
Table 1 (Main Table)
Ref_number GA_info_1 GA_info_2
A1 z z
A2 x y
我正在使用 My_SQL 数据库
UPDATE mytable
LEFT JOIN table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
mytable.GA_info_1,
table2.GA_info_1
),
mytable.GA_info_2 = COALESCE (
mytable.GA_info_2,
table2.GA_info_2
)
WHERE
mytable.GA_info_1 IS NULL
OR mytable.GA_info_2 IS NULL
将所有可能为空的字段放入where子句中。
如果该字段不为空,则不会更新,因为它是 coalesce
函数中的第一个参数,如果为空,它将由另一个 table 的字段更新.
编辑:你也可以这样试:
UPDATE mytable
INNER JOIN table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
mytable.GA_info_1,
table2.GA_info_1
),
mytable.GA_info_2 = COALESCE (
mytable.GA_info_2,
table2.GA_info_2
)
WHERE
CONCAT(mytable.GA_info_1, mytable.GA_info_2) IS NULL
对于性能问题:(评论中已经提到)
由于您没有使用主键或外键来连接 table,因此您必须在 Ref_number 列上设置索引以加快连接速度。
UPDATE DIM_ENQUIRIES_TEST
LEFT JOIN DIM_ENQUIRIES_TEST AS STAGING_GA ON DIM_ENQUIRIES_TEST.website_reference_number = STAGING_GA.eventlabel
SET DIM_ENQUIRIES_TEST.eventlabel = COALESCE (
DIM_ENQUIRIES_TEST.eventlabel,
STAGING_GA.eventlabel
),
DIM_ENQUIRIES_TEST.sourcemedium = COALESCE (
DIM_ENQUIRIES_TEST.sourcemedium,
STAGING_GA.sourcemedium
)
,
DIM_ENQUIRIES_TEST.deviceCategory = COALESCE (
DIM_ENQUIRIES_TEST.deviceCategory,
STAGING_GA.deviceCategory
)
,
DIM_ENQUIRIES_TEST.avgSessionDuration = COALESCE (
DIM_ENQUIRIES_TEST.avgSessionDuration,
STAGING_GA.avgSessionDuration
)
,
DIM_ENQUIRIES_TEST.timeonpage = COALESCE (
DIM_ENQUIRIES_TEST.timeonpage,
STAGING_GA.timeonpage
)
,
DIM_ENQUIRIES_TEST.avgtimeonpage = COALESCE (
DIM_ENQUIRIES_TEST.avgtimeonpage,
STAGING_GA.avgtimeonpage
)
,
DIM_ENQUIRIES_TEST.bouncerate = COALESCE (
DIM_ENQUIRIES_TEST.bouncerate,
STAGING_GA.bouncerate
)
,
DIM_ENQUIRIES_TEST.profileid = COALESCE (
DIM_ENQUIRIES_TEST.profileid,
STAGING_GA.profileid
)
,
DIM_ENQUIRIES_TEST.webpropertyid = COALESCE (
DIM_ENQUIRIES_TEST.webpropertyid,
STAGING_GA.webpropertyid
)
,
DIM_ENQUIRIES_TEST.accountname = COALESCE (
DIM_ENQUIRIES_TEST.accountname,
STAGING_GA.accountname
)
,
DIM_ENQUIRIES_TEST.tableid = COALESCE (
DIM_ENQUIRIES_TEST.tableid,
STAGING_GA.tableid
)
,
DIM_ENQUIRIES_TEST.tablename = COALESCE (
DIM_ENQUIRIES_TEST.tablename,
STAGING_GA.tablename
)
,
DIM_ENQUIRIES_TEST.keyword = COALESCE (
DIM_ENQUIRIES_TEST.keyword,
STAGING_GA.keyword
)
,
DIM_ENQUIRIES_TEST.country = COALESCE (
DIM_ENQUIRIES_TEST.country,
STAGING_GA.country
)
,
DIM_ENQUIRIES_TEST.campaign = COALESCE (
DIM_ENQUIRIES_TEST.campaign,
STAGING_GA.campaign
)
,
DIM_ENQUIRIES_TEST.sessions = COALESCE (
DIM_ENQUIRIES_TEST.sessions,
STAGING_GA.sessions
)
,
DIM_ENQUIRIES_TEST.sessionduration = COALESCE (
DIM_ENQUIRIES_TEST.sessionduration,
STAGING_GA.sessionduration
)
,
DIM_ENQUIRIES_TEST.bounces = COALESCE (
DIM_ENQUIRIES_TEST.bounces,
STAGING_GA.bounces
)
WHERE
DIM_ENQUIRIES_TEST.EventLabel IS NULL
OR DIM_ENQUIRIES_TEST.SourceMedium IS NULL
;
--我只检查一个,因为如果其中一个为空,则可能需要更改的其余列也为空