如何将逗号分隔的字符串从列表中获取到 PySpark 中的查询?
How to get strings separated by commas from a list to a query in PySpark?
我想在 PySpark 中使用列表生成查询
list = ["hi@gmail.com", "goodbye@gmail.com"]
query = "SELECT * FROM table WHERE email IN (" + list + ")"
这是我想要的输出:
query
SELECT * FROM table WHERE email IN ("hi@gmail.com", "goodbye@gmail.com")
相反,我得到:TypeError: cannot concatenate 'str' and 'list' objects
谁能帮我实现这个?谢谢
如果有人遇到同样的问题,我发现您可以使用以下代码:
"'"+"','".join(map(str, emails))+"'"
您将得到以下输出:
SELECT * FROM table WHERE email IN ('hi@gmail.com', 'goodbye@gmail.com')
试试这个:
基于数据框的方法 -
df = spark.createDataFrame([(1,"hi@gmail.com") ,(2,"goodbye@gmail.com",),(3,"abc@gmail.com",),(4,"xyz@gmail.com")], ['id','email_id'])
email_filter_list = ["hi@gmail.com", "goodbye@gmail.com"]
df.where(col('email_id').isin(email_filter_list)).show()
基于 Spark SQL 的方法 -
df = spark.createDataFrame([(1,"hi@gmail.com") ,(2,"goodbye@gmail.com",),(3,"abc@gmail.com",),(4,"xyz@gmail.com")], ['id','email_id'])
df.createOrReplaceTempView('t1')
sql_filter = ','.join(["'" +i + "'" for i in email_filter_list])
spark.sql("SELECT * FROM t1 WHERE email_id IN ({})".format(sql_filter)).show()
我想在 PySpark 中使用列表生成查询
list = ["hi@gmail.com", "goodbye@gmail.com"]
query = "SELECT * FROM table WHERE email IN (" + list + ")"
这是我想要的输出:
query
SELECT * FROM table WHERE email IN ("hi@gmail.com", "goodbye@gmail.com")
相反,我得到:TypeError: cannot concatenate 'str' and 'list' objects
谁能帮我实现这个?谢谢
如果有人遇到同样的问题,我发现您可以使用以下代码:
"'"+"','".join(map(str, emails))+"'"
您将得到以下输出:
SELECT * FROM table WHERE email IN ('hi@gmail.com', 'goodbye@gmail.com')
试试这个:
基于数据框的方法 -
df = spark.createDataFrame([(1,"hi@gmail.com") ,(2,"goodbye@gmail.com",),(3,"abc@gmail.com",),(4,"xyz@gmail.com")], ['id','email_id'])
email_filter_list = ["hi@gmail.com", "goodbye@gmail.com"]
df.where(col('email_id').isin(email_filter_list)).show()
基于 Spark SQL 的方法 -
df = spark.createDataFrame([(1,"hi@gmail.com") ,(2,"goodbye@gmail.com",),(3,"abc@gmail.com",),(4,"xyz@gmail.com")], ['id','email_id'])
df.createOrReplaceTempView('t1')
sql_filter = ','.join(["'" +i + "'" for i in email_filter_list])
spark.sql("SELECT * FROM t1 WHERE email_id IN ({})".format(sql_filter)).show()