用数据框scala中的另一个日期减去当前日期

substract current date with another date in dataframe scala

首先,感谢您花时间阅读我的问题:)

我的问题如下:在 Spark with Scala 中,我有一个数据框,其中包含一个日期格式为 dd/MM/yyyy HH:mm 的字符串,例如 df

+----------------+
|date            |
+----------------+
|8/11/2017 15:00 |
|9/11/2017 10:00 |
+----------------+

我想得到 currentDate 与 dataframe 日期的差异,例如

df.withColumn("difference", currentDate - unix_timestamp(col(date)))

+----------------+------------+
|date            | difference |
+----------------+------------+
|8/11/2017 15:00 | xxxxxxxxxx |
|9/11/2017 10:00 | xxxxxxxxxx |
+----------------+------------+

我试试

val current = current_timestamp()
df.withColumn("difference", current - unix_timestamp(col(date)))

但是出现这个错误

org.apache.spark.sql.AnalysisException: cannot resolve '(current_timestamp() - unix_timestamp(date, 'yyyy-MM-dd HH:mm:ss'))' due to data type mismatch: differing types in '(current_timestamp() - unix_timestamp(date, 'yyyy-MM-dd HH:mm:ss'))' (timestamp and bigint).;;

我也试试

val current = BigInt(System.currenttimeMillis / 1000)
df.withColumn("difference", current - unix_timestamp(col(date))) 

val current = unix_timestamp(current_timestamp())
but the col "difference" is null

谢谢

您必须为 unix_timestamp 使用正确的格式:

df.withColumn("difference", current_timestamp().cast("long") - unix_timestamp(col("date"), "dd/mm/yyyy HH:mm"))

或最新版本:

to_timestamp(col("date"), "dd/mm/yyyy HH:mm") - current_timestamp())

获取 Interval 列。