spark - 如何从时间戳中提取小时数?
spark - how to extract hour from timestamp?
请帮助理解为什么 date_format
不提取 08:15
8:15am?
spark.sql("select date_format(date '1994-11-05T08:15:30-05:00', 'hh:mm') AS hour_in_day_number").show()
+------------------+
|hour_in_day_number|
+------------------+
| 12:00|
+------------------+
spark.sql("select date_format(date '1994-11-05T08:15:30-05:00', 'HH:mm') AS hour_in_day_number").show()
+------------------+
|hour_in_day_number|
+------------------+
| 00:00|
+------------------+
spark.sql("select date_format(date '1994-11-05T08:15:30-05:00', 'kk:mm') AS hour_in_day_number").show()
+------------------+
|hour_in_day_number|
+------------------+
| 24:00|
+------------------+
Examples
1994-11-05T08:15:30-05:00 corresponds to November 5, 1994, 8:15:30 am, US Eastern Standard Time.
Datetime Patterns for Formatting and Parsing
Symbol
Meaning
Presentation
Examples
G
era
text
AD; Anno Domini
y
year
year
2020; 20
D
day-of-year
number(3)
189
M/L
month-of-year
month
7; 07; Jul; July
d
day-of-month
number(3)
28
Q/q
quarter-of-year
number/text
3; 03; Q3; 3rd quarter
E
day-of-week
text
Tue; Tuesday
F
aligned day of week in month
number(1)
3
a
am-pm-of-day
am-pm
PM
h
clock-hour-of-am-pm (1-12)
number(2)
12
K
hour-of-am-pm (0-11)
number(2)
0
k
clock-hour-of-day (1-24)
number(2)
0
H
hour-of-day (0-23)
number(2)
0
m
minute-of-hour
number(2)
30
s
second-of-minute
number(2)
55
S
fraction-of-second
fraction
978
V
time-zone ID
zone-id
America/Los_Angeles; Z; -08:30
z
time-zone name
zone-name
Pacific Standard Time; PST
O
localized zone-offset
offset-O
GMT+8; GMT+08:00; UTC-08:00;
X
zone-offset ‘Z’ for zero
offset-X
Z; -08; -0830; -08:30; -083015; -08:30:15;
x
zone-offset
offset-x
+0000; -08; -0830; -08:30; -083015; -08:30:15;
Z
zone-offset
offset-Z
+0000; -0800; -08:00;
环境
$ spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.1.2
/_/
Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 1.8.0_312
Branch HEAD
Compiled by user centos on 2021-05-24T04:27:48Z
Revision de351e30a90dd988b133b3d00fa6218bfcaba8b8
Url https://github.com/apache/spark
Type --help for more information.
您使用了date
,只保留年月日
您可以尝试使用 tiemstamp
如下:
scala> spark.sql("select date_format(timestamp '1994-11-05T08:15:30-05:00', 'hh:mm') AS hour_in_day_number").show()
+------------------+
|hour_in_day_number|
+------------------+
| 05:15|
+------------------+
请帮助理解为什么 date_format
不提取 08:15
8:15am?
spark.sql("select date_format(date '1994-11-05T08:15:30-05:00', 'hh:mm') AS hour_in_day_number").show()
+------------------+
|hour_in_day_number|
+------------------+
| 12:00|
+------------------+
spark.sql("select date_format(date '1994-11-05T08:15:30-05:00', 'HH:mm') AS hour_in_day_number").show()
+------------------+
|hour_in_day_number|
+------------------+
| 00:00|
+------------------+
spark.sql("select date_format(date '1994-11-05T08:15:30-05:00', 'kk:mm') AS hour_in_day_number").show()
+------------------+
|hour_in_day_number|
+------------------+
| 24:00|
+------------------+
Examples
1994-11-05T08:15:30-05:00 corresponds to November 5, 1994, 8:15:30 am, US Eastern Standard Time.
Datetime Patterns for Formatting and Parsing
Symbol | Meaning | Presentation | Examples |
---|---|---|---|
G | era | text | AD; Anno Domini |
y | year | year | 2020; 20 |
D | day-of-year | number(3) | 189 |
M/L | month-of-year | month | 7; 07; Jul; July |
d | day-of-month | number(3) | 28 |
Q/q | quarter-of-year | number/text | 3; 03; Q3; 3rd quarter |
E | day-of-week | text | Tue; Tuesday |
F | aligned day of week in month | number(1) | 3 |
a | am-pm-of-day | am-pm | PM |
h | clock-hour-of-am-pm (1-12) | number(2) | 12 |
K | hour-of-am-pm (0-11) | number(2) | 0 |
k | clock-hour-of-day (1-24) | number(2) | 0 |
H | hour-of-day (0-23) | number(2) | 0 |
m | minute-of-hour | number(2) | 30 |
s | second-of-minute | number(2) | 55 |
S | fraction-of-second | fraction | 978 |
V | time-zone ID | zone-id | America/Los_Angeles; Z; -08:30 |
z | time-zone name | zone-name | Pacific Standard Time; PST |
O | localized zone-offset | offset-O | GMT+8; GMT+08:00; UTC-08:00; |
X | zone-offset ‘Z’ for zero | offset-X | Z; -08; -0830; -08:30; -083015; -08:30:15; |
x | zone-offset | offset-x | +0000; -08; -0830; -08:30; -083015; -08:30:15; |
Z | zone-offset | offset-Z | +0000; -0800; -08:00; |
环境
$ spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.1.2
/_/
Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 1.8.0_312
Branch HEAD
Compiled by user centos on 2021-05-24T04:27:48Z
Revision de351e30a90dd988b133b3d00fa6218bfcaba8b8
Url https://github.com/apache/spark
Type --help for more information.
您使用了date
,只保留年月日
您可以尝试使用 tiemstamp
如下:
scala> spark.sql("select date_format(timestamp '1994-11-05T08:15:30-05:00', 'hh:mm') AS hour_in_day_number").show()
+------------------+
|hour_in_day_number|
+------------------+
| 05:15|
+------------------+