Pyspark 加入混合条件
Pyspark join with mixed conditions
我有两个数据框:left_df 和 right_df 有共同的列加入:['col_1, 'col_2']
,我想加入另一个条件: right_df.col_3.between(left_df.col_4, left_df.col_5)]
代码:
from pyspark.sql import functions as F
join_condition = ['col_1',
'col_2',
right_df.col_3.between(left_df.col_4, left_df.col_5)]
df = left_df.join(right_df, on=join_condition, how='left')
df.write.parquet('/tmp/my_df')
但是我得到以下错误:
TypeError: Column is not iterable
为什么我不能将这 3 个条件加在一起?
您不能将字符串与列混合使用。表达式必须是字符串列表或列列表,而不是两者的混合。您可以将前两项转换为列表达式,例如
from pyspark.sql import functions as F
join_condition = [left_df.col_1 == right_df.col_1,
left_df.col_2 == right_df.col_2,
right_df.col_3.between(left_df.col_4, left_df.col_5)]
df = left_df.join(right_df, on=join_condition, how='left')
我有两个数据框:left_df 和 right_df 有共同的列加入:['col_1, 'col_2']
,我想加入另一个条件: right_df.col_3.between(left_df.col_4, left_df.col_5)]
代码:
from pyspark.sql import functions as F
join_condition = ['col_1',
'col_2',
right_df.col_3.between(left_df.col_4, left_df.col_5)]
df = left_df.join(right_df, on=join_condition, how='left')
df.write.parquet('/tmp/my_df')
但是我得到以下错误:
TypeError: Column is not iterable
为什么我不能将这 3 个条件加在一起?
您不能将字符串与列混合使用。表达式必须是字符串列表或列列表,而不是两者的混合。您可以将前两项转换为列表达式,例如
from pyspark.sql import functions as F
join_condition = [left_df.col_1 == right_df.col_1,
left_df.col_2 == right_df.col_2,
right_df.col_3.between(left_df.col_4, left_df.col_5)]
df = left_df.join(right_df, on=join_condition, how='left')