Pyspark 通过列表理解从日期时间获取时间属性
Pyspark obtain time attributes from datetime with list comprehension
我有一个 pyspark 数据框 df:
+-------------------+
| timestamplast|
+-------------------+
|2019-08-01 00:00:00|
|2019-08-01 00:01:09|
|2019-08-01 01:00:20|
|2019-08-03 00:00:27|
+-------------------+
我想通过列表理解将列 'year'、'month'、'day'、'hour' 添加到现有数据框中。
在 Pandas 中,将这样做:
L = ['year', 'month', 'day', 'hour']
date_gen = (getattr(df['timestamplast'].dt, i).rename(i) for i in L)
df = df.join(pd.concat(date_gen, axis=1)) # concatenate results and join to original dataframe
这在 pyspark 中如何完成?
检查以下内容:
df.selectExpr("*", *[ '{0}(timestamplast) as {0}'.format(c) for c in L]).show()
+-------------------+----+-----+---+----+
| timestamplast|year|month|day|hour|
+-------------------+----+-----+---+----+
|2019-08-01 00:00:00|2019| 8| 1| 0|
|2019-08-03 00:00:27|2019| 8| 3| 0|
+-------------------+----+-----+---+----+
我有一个 pyspark 数据框 df:
+-------------------+
| timestamplast|
+-------------------+
|2019-08-01 00:00:00|
|2019-08-01 00:01:09|
|2019-08-01 01:00:20|
|2019-08-03 00:00:27|
+-------------------+
我想通过列表理解将列 'year'、'month'、'day'、'hour' 添加到现有数据框中。
在 Pandas 中,将这样做:
L = ['year', 'month', 'day', 'hour']
date_gen = (getattr(df['timestamplast'].dt, i).rename(i) for i in L)
df = df.join(pd.concat(date_gen, axis=1)) # concatenate results and join to original dataframe
这在 pyspark 中如何完成?
检查以下内容:
df.selectExpr("*", *[ '{0}(timestamplast) as {0}'.format(c) for c in L]).show()
+-------------------+----+-----+---+----+
| timestamplast|year|month|day|hour|
+-------------------+----+-----+---+----+
|2019-08-01 00:00:00|2019| 8| 1| 0|
|2019-08-03 00:00:27|2019| 8| 3| 0|
+-------------------+----+-----+---+----+