使用 Impala 在 id 字段上连接两个表

Join two tables on id fields using Impala

我在 HDFS 中有两个 table,我想使用 Impala 加入。一个是 Employee_Logs 另一个是 HR_Data.

查询:

select e.employee_id, e.action from Employee_Logs e where e.employment_status_desc = 'Active'
select h.employee_id, h.name from HR_Data h

Employee_Logs:

employee_id  action
2325255b     login     
51666164     login
51666164v    login
r1211        logoff
r18552421    login

HR_Data:

employee_id  name
2325255      Rob    
51666164     Tom
r1211        Tammy
r18552421    Ron

我想加入他们,这样数据看起来像这样:

employee_id  action  name
2325255b     login   Rob  
51666164     login   Tom
51666164v    login   Tom
r1211        logoff  Tammy
r18552421    login   Ron

如果 employee_id 字段在两个 table 上匹配,我可以轻松加入,但同一用户可以在之后有 "b" 或 "v"他们的员工 ID 以指定该帐户是否像管理员帐户一样被提升。一些用户帐户在 id 前面有一个 "r",但在两个 table 中都是这种情况。

有没有一种方法可以让我在 Employee_Logs table 中创建一个新字段,比如去掉 "v" 和 "b"员工id结尾然后加入还是有更好的方法?

可能最安全的方法是多次 left 连接:

select el.*,
       coalesce(h.name, hv.name, hb.name) as name
from employee_logs el left join
     hr_data h
     on el.employee_id = h.employee_id left join
     hr_data hv
     on el.employee_id = concat(h.employee_id, 'v') left join
     hr_data hb
     on el.employee_id = concat(h.employee_id, 'b');
   Select employee_id,action,h1.name from Employee_Logs 
   where RTRIM(employee_id,'b','v'),name IN (Select employee_id,name 
   from HR_DATA as h1);

You can make use of subquery as above, as you have majority of the records needed in the Employee_logs itself and take the reference of common ids to get the name for each record. Or Left join is the best to use in such situations as well meaning will give the data which is common to both the tables keeping the left tables data as majority

join 条件中使用 regexp_replace,将字符串末尾的 bv 替换为空字符串以匹配员工 ID。

select el.employee_id,el.action,hr.name
from employee_logs el
join hr_data hr on hr.employee_id = regexp_replace(el.employee_id,'[bv]$','')