在 spark 中解析 RFC3339 时间戳

Parse RFC3339 timestamp in spark

Spark 的 CSV timestampFormat

timestampFormat – sets the string that indicates a timestamp format. Custom date formats follow the formats at java.text.SimpleDateFormat. This applies to timestamp type. If None is set, it uses the default value, yyyy-MM-dd'T'HH:mm:ss.SSSXXX.

SimpleDateFormat 似乎不支持 RFC3339,看起来像 2017-11-27T07:10:07Z.

spark读取CSV文件时应该如何配置解析这个时间格式?

Spark 会自动为您完成:

>>> df=spark.read.option("header","true").option("inferSchema","true").option("delimiter",",").csv("file:///temp/1.csv")
>>> df.printSchema()
root
 |-- ts: timestamp (nullable = true)
 |-- val: integer (nullable = true)

>>> df.show()
+-------------------+---+
|                 ts|val|
+-------------------+---+
|2017-11-27 08:10:07|  1|
|2017-11-28 09:08:08|  1|
|2017-11-30 00:59:59|  1|
+-------------------+---+