动态 regexp_extract 基于 spark 中的列值

Question

我想知道是否有办法根据列值在我的数据集中实现动态 regexp_extract。

例如：
如果我的列 A 值为 "N06" 我需要将正则表达式用作 "(?<=2020:).?\\n"
否则，如果我的列 A 值为 "N02"，我需要将正则表达式用作 "(?<=2026:).?\\n"

ds.withColumn("extracted",functions.regexp_extract(functions.col("A"),regex,0))

Answer 1

尝试像这样使用 when 和 otherwise：

when(col("C") === "N06", regexp_extract(col("A"), regex1, 0))
  .otherwise(regexp_extract(col("A"), regex2, 0))

Dynamic regexp_extract based on column value in spark