在单个 spark 数据帧中减去两个字符串列的最佳 PySpark 实践是什么?
What is the best PySpark practice to subtract two string columns within a single spark dataframe?
假设我有一个 spark 数据框,如下所示:
data
A
Expected_column= data - A
https://example1.org/path/to/file?param=42#fragment
param=42#fragment
https://example1.org/path/to/file?
https://example2.org/path/to/file
NaN
https://example2.org/path/to/file
我在想是否有一种合适的过滤机制,可以将两个 string 列彼此相减,例如:
sdf1 = sdf.withColumn('Expected_column', ( sdf['data'] - sdf['A'] ))
这 returns Null
用于第 Expected_column
列的所有行。我检查了像这样的不同解决方案 , but they are dealing with two dataframe while my case is within a single data frame as well as their issues are not dealing with string columns. The closest question was about ,这又不是我的情况。
您要找的函数叫做 replace
:
from pyspark.sql import functions as F
sdf.withColumn("data - A", F.expr("replace(data, coalesce(A, ''), '')")).show(
truncate=False
)
+---------------------------------------------------+-----------------+----------------------------------+
|data |A |data - A |
+---------------------------------------------------+-----------------+----------------------------------+
|https://example1.org/path/to/file?param=42#fragment|param=42#fragment|https://example1.org/path/to/file?|
|https://example2.org/path/to/file |null |https://example2.org/path/to/file |
+---------------------------------------------------+-----------------+----------------------------------+
假设我有一个 spark 数据框,如下所示:
data | A | Expected_column= data - A |
---|---|---|
https://example1.org/path/to/file?param=42#fragment | param=42#fragment | https://example1.org/path/to/file? |
https://example2.org/path/to/file | NaN | https://example2.org/path/to/file |
我在想是否有一种合适的过滤机制,可以将两个 string 列彼此相减,例如:
sdf1 = sdf.withColumn('Expected_column', ( sdf['data'] - sdf['A'] ))
这 returns Null
用于第 Expected_column
列的所有行。我检查了像这样的不同解决方案
您要找的函数叫做 replace
:
from pyspark.sql import functions as F
sdf.withColumn("data - A", F.expr("replace(data, coalesce(A, ''), '')")).show(
truncate=False
)
+---------------------------------------------------+-----------------+----------------------------------+
|data |A |data - A |
+---------------------------------------------------+-----------------+----------------------------------+
|https://example1.org/path/to/file?param=42#fragment|param=42#fragment|https://example1.org/path/to/file?|
|https://example2.org/path/to/file |null |https://example2.org/path/to/file |
+---------------------------------------------------+-----------------+----------------------------------+