在 spark 中解析 RFC3339 时间戳
Parse RFC3339 timestamp in spark
Spark 的 CSV timestampFormat
:
timestampFormat – sets the string that indicates a timestamp format. Custom date formats follow the formats at java.text.SimpleDateFormat. This applies to timestamp type. If None is set, it uses the default value, yyyy-MM-dd'T'HH:mm:ss.SSSXXX.
SimpleDateFormat 似乎不支持 RFC3339,看起来像 2017-11-27T07:10:07Z
.
spark读取CSV文件时应该如何配置解析这个时间格式?
Spark 会自动为您完成:
>>> df=spark.read.option("header","true").option("inferSchema","true").option("delimiter",",").csv("file:///temp/1.csv")
>>> df.printSchema()
root
|-- ts: timestamp (nullable = true)
|-- val: integer (nullable = true)
>>> df.show()
+-------------------+---+
| ts|val|
+-------------------+---+
|2017-11-27 08:10:07| 1|
|2017-11-28 09:08:08| 1|
|2017-11-30 00:59:59| 1|
+-------------------+---+
Spark 的 CSV timestampFormat
:
timestampFormat – sets the string that indicates a timestamp format. Custom date formats follow the formats at java.text.SimpleDateFormat. This applies to timestamp type. If None is set, it uses the default value, yyyy-MM-dd'T'HH:mm:ss.SSSXXX.
SimpleDateFormat 似乎不支持 RFC3339,看起来像 2017-11-27T07:10:07Z
.
spark读取CSV文件时应该如何配置解析这个时间格式?
Spark 会自动为您完成:
>>> df=spark.read.option("header","true").option("inferSchema","true").option("delimiter",",").csv("file:///temp/1.csv")
>>> df.printSchema()
root
|-- ts: timestamp (nullable = true)
|-- val: integer (nullable = true)
>>> df.show()
+-------------------+---+
| ts|val|
+-------------------+---+
|2017-11-27 08:10:07| 1|
|2017-11-28 09:08:08| 1|
|2017-11-30 00:59:59| 1|
+-------------------+---+