用数据框scala中的另一个日期减去当前日期
substract current date with another date in dataframe scala
首先,感谢您花时间阅读我的问题:)
我的问题如下:在 Spark with Scala 中,我有一个数据框,其中包含一个日期格式为 dd/MM/yyyy HH:mm 的字符串,例如 df
+----------------+
|date |
+----------------+
|8/11/2017 15:00 |
|9/11/2017 10:00 |
+----------------+
我想得到 currentDate 与 dataframe 日期的差异,例如
df.withColumn("difference", currentDate - unix_timestamp(col(date)))
+----------------+------------+
|date | difference |
+----------------+------------+
|8/11/2017 15:00 | xxxxxxxxxx |
|9/11/2017 10:00 | xxxxxxxxxx |
+----------------+------------+
我试试
val current = current_timestamp()
df.withColumn("difference", current - unix_timestamp(col(date)))
但是出现这个错误
org.apache.spark.sql.AnalysisException: cannot resolve '(current_timestamp() - unix_timestamp(date
, 'yyyy-MM-dd HH:mm:ss'))' due to data type mismatch: differing types in '(current_timestamp() - unix_timestamp(date
, 'yyyy-MM-dd HH:mm:ss'))' (timestamp and bigint).;;
我也试试
val current = BigInt(System.currenttimeMillis / 1000)
df.withColumn("difference", current - unix_timestamp(col(date)))
和
val current = unix_timestamp(current_timestamp())
but the col "difference" is null
谢谢
您必须为 unix_timestamp
使用正确的格式:
df.withColumn("difference", current_timestamp().cast("long") - unix_timestamp(col("date"), "dd/mm/yyyy HH:mm"))
或最新版本:
to_timestamp(col("date"), "dd/mm/yyyy HH:mm") - current_timestamp())
获取 Interval
列。
首先,感谢您花时间阅读我的问题:)
我的问题如下:在 Spark with Scala 中,我有一个数据框,其中包含一个日期格式为 dd/MM/yyyy HH:mm 的字符串,例如 df
+----------------+
|date |
+----------------+
|8/11/2017 15:00 |
|9/11/2017 10:00 |
+----------------+
我想得到 currentDate 与 dataframe 日期的差异,例如
df.withColumn("difference", currentDate - unix_timestamp(col(date)))
+----------------+------------+
|date | difference |
+----------------+------------+
|8/11/2017 15:00 | xxxxxxxxxx |
|9/11/2017 10:00 | xxxxxxxxxx |
+----------------+------------+
我试试
val current = current_timestamp()
df.withColumn("difference", current - unix_timestamp(col(date)))
但是出现这个错误
org.apache.spark.sql.AnalysisException: cannot resolve '(current_timestamp() - unix_timestamp(
date
, 'yyyy-MM-dd HH:mm:ss'))' due to data type mismatch: differing types in '(current_timestamp() - unix_timestamp(date
, 'yyyy-MM-dd HH:mm:ss'))' (timestamp and bigint).;;
我也试试
val current = BigInt(System.currenttimeMillis / 1000)
df.withColumn("difference", current - unix_timestamp(col(date)))
和
val current = unix_timestamp(current_timestamp())
but the col "difference" is null
谢谢
您必须为 unix_timestamp
使用正确的格式:
df.withColumn("difference", current_timestamp().cast("long") - unix_timestamp(col("date"), "dd/mm/yyyy HH:mm"))
或最新版本:
to_timestamp(col("date"), "dd/mm/yyyy HH:mm") - current_timestamp())
获取 Interval
列。