如果某些字段为空且具有来自不同列的相关值,则更新 table

Update table if certain fields are null with related values from a different column

我正在 kettle pentaho 中编写 ETL,以从包括 google 分析在内的各种来源创建 table。

so Table 1 = 来自网站的所有数据加入到 google 分析信息 Table 2 = 所有来自 Table 1 的重复数据加入到 google 分析信息

我的问题是 table 1 上的某些信息缺少 google 分析信息,但 table 2 显示了 Google 分析的一些数据 reference_number

所以我想做的是从 table 1 到 table 2 查找 [reference_number] 并填充 table 1,其中某些列从信息中为空table2

快速示例编辑*

Table 1 (Main Table) * *This table has an index built in on website_reference number (Unique)*
  website_Reference_number   GA_info_1   GA_info_2 
  A1              null       null
  A2               x           y

Table 2 (Duplicates from Table 1)           
  eventlabel   GA_info_1   GA_info_2
  A1               z            z
  A2               x            y

我的输出应该是下面的

Table 1 (Main Table)
Ref_number   GA_info_1   GA_info_2 
A1               z            z
A2               x            y

我正在使用 My_SQL 数据库

UPDATE mytable
LEFT JOIN table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
    mytable.GA_info_1,
    table2.GA_info_1
),
 mytable.GA_info_2 = COALESCE (
    mytable.GA_info_2,
    table2.GA_info_2
)
WHERE
    mytable.GA_info_1 IS NULL
OR mytable.GA_info_2 IS NULL

将所有可能为空的字段放入where子句中。

如果该字段不为空,则不会更新,因为它是 coalesce 函数中的第一个参数,如果为空,它将由另一个 table 的字段更新.

编辑:你也可以这样试:

UPDATE mytable
INNER JOIN table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
    mytable.GA_info_1,
    table2.GA_info_1
),
 mytable.GA_info_2 = COALESCE (
    mytable.GA_info_2,
    table2.GA_info_2
)
WHERE
    CONCAT(mytable.GA_info_1, mytable.GA_info_2) IS NULL

对于性能问题:(评论中已经提到)

由于您没有使用主键或外键来连接 table,因此您必须在 Ref_number 列上设置索引以加快连接速度。

    UPDATE DIM_ENQUIRIES_TEST
LEFT JOIN DIM_ENQUIRIES_TEST AS STAGING_GA ON DIM_ENQUIRIES_TEST.website_reference_number = STAGING_GA.eventlabel
SET DIM_ENQUIRIES_TEST.eventlabel = COALESCE (
DIM_ENQUIRIES_TEST.eventlabel,
STAGING_GA.eventlabel
),
DIM_ENQUIRIES_TEST.sourcemedium = COALESCE (
DIM_ENQUIRIES_TEST.sourcemedium,
STAGING_GA.sourcemedium
)
,
DIM_ENQUIRIES_TEST.deviceCategory = COALESCE (
DIM_ENQUIRIES_TEST.deviceCategory,
STAGING_GA.deviceCategory
)
,
DIM_ENQUIRIES_TEST.avgSessionDuration = COALESCE (
DIM_ENQUIRIES_TEST.avgSessionDuration,
STAGING_GA.avgSessionDuration
)
,
DIM_ENQUIRIES_TEST.timeonpage = COALESCE (
DIM_ENQUIRIES_TEST.timeonpage,
STAGING_GA.timeonpage
)
,
DIM_ENQUIRIES_TEST.avgtimeonpage = COALESCE (
DIM_ENQUIRIES_TEST.avgtimeonpage,
STAGING_GA.avgtimeonpage
)
,
DIM_ENQUIRIES_TEST.bouncerate = COALESCE (
DIM_ENQUIRIES_TEST.bouncerate,
STAGING_GA.bouncerate
)
,
DIM_ENQUIRIES_TEST.profileid = COALESCE (
DIM_ENQUIRIES_TEST.profileid,
STAGING_GA.profileid
)
,
DIM_ENQUIRIES_TEST.webpropertyid = COALESCE (
DIM_ENQUIRIES_TEST.webpropertyid,
STAGING_GA.webpropertyid
)
,
DIM_ENQUIRIES_TEST.accountname = COALESCE (
DIM_ENQUIRIES_TEST.accountname,
STAGING_GA.accountname
)
,
DIM_ENQUIRIES_TEST.tableid = COALESCE (
DIM_ENQUIRIES_TEST.tableid,
STAGING_GA.tableid
)
,
DIM_ENQUIRIES_TEST.tablename = COALESCE (
DIM_ENQUIRIES_TEST.tablename,
STAGING_GA.tablename
)
,
DIM_ENQUIRIES_TEST.keyword = COALESCE (
DIM_ENQUIRIES_TEST.keyword,
STAGING_GA.keyword
)
,
DIM_ENQUIRIES_TEST.country = COALESCE (
DIM_ENQUIRIES_TEST.country,
STAGING_GA.country
)
,
DIM_ENQUIRIES_TEST.campaign = COALESCE (
DIM_ENQUIRIES_TEST.campaign,
STAGING_GA.campaign
)
,
DIM_ENQUIRIES_TEST.sessions = COALESCE (
DIM_ENQUIRIES_TEST.sessions,
STAGING_GA.sessions
)
,
DIM_ENQUIRIES_TEST.sessionduration = COALESCE (
DIM_ENQUIRIES_TEST.sessionduration,
STAGING_GA.sessionduration
)
,
DIM_ENQUIRIES_TEST.bounces = COALESCE (
DIM_ENQUIRIES_TEST.bounces,
STAGING_GA.bounces
)
WHERE
DIM_ENQUIRIES_TEST.EventLabel IS NULL
OR DIM_ENQUIRIES_TEST.SourceMedium IS NULL
;

--我只检查一个,因为如果其中一个为空,则可能需要更改的其余列也为空