如何从 Pandas 日期时间创建新列
How to create new columns from Pandas datetime
我有一个带有日期时间列的数据框 df。整个数据框有 2000 万行,为方便起见,我只取 3 行。
df = pd.DataFrame({})
df['Date'] = pd.to_datetime(np.arange(0,3), unit='h', origin='2018-08-01 00:00:00')
Date
0 2018-08-01 00:00:00
1 2018-08-01 01:00:00
2 2018-08-01 02:00:00
从日期开始,我想创建新列“00_hrs”、“01_hrs”、“02_hrs”(等等,直到“23_hrs”)其中值为 0 或 1。当给定日期时间的小时适用于列中给定的小时时为 1,否则为 0。
结果应如下所示:
Date 00_hrs 01_hrs 02_hrs ... 23_hrs
0 2018-08-01 00:00:00 1 0 0 0
1 2018-08-01 01:00:00 0 1 0 0
2 2018-08-01 02:00:00 0 0 1 0
使用get_dummies
with hours generated by Series.dt.strftime
and then add to original by DataFrame.join
:
df = df.join(pd.get_dummies(df['Date'].dt.strftime('%H_hrs')))
print (df)
Date 00_hrs 01_hrs 02_hrs
0 2018-08-01 00:00:00 1 0 0
1 2018-08-01 01:00:00 0 1 0
2 2018-08-01 02:00:00 0 0 1
如果可能的话,可以通过 DataFrame.reindex
:
添加一些小时
hours = [f'{n:02}_hrs' for n in range(24)]
df = (df.join(pd.get_dummies(df['Date'].dt.strftime('%H_hrs'))
.reindex(hours, axis=1, fill_value=0)))
print (df)
Date 00_hrs 01_hrs 02_hrs 03_hrs 04_hrs 05_hrs 06_hrs \
0 2018-08-01 00:00:00 1 0 0 0 0 0 0
1 2018-08-01 01:00:00 0 1 0 0 0 0 0
2 2018-08-01 02:00:00 0 0 1 0 0 0 0
07_hrs 08_hrs 09_hrs 10_hrs 11_hrs 12_hrs 13_hrs 14_hrs 15_hrs \
0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
16_hrs 17_hrs 18_hrs 19_hrs 20_hrs 21_hrs 22_hrs 23_hrs
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
我有一个带有日期时间列的数据框 df。整个数据框有 2000 万行,为方便起见,我只取 3 行。
df = pd.DataFrame({})
df['Date'] = pd.to_datetime(np.arange(0,3), unit='h', origin='2018-08-01 00:00:00')
Date
0 2018-08-01 00:00:00
1 2018-08-01 01:00:00
2 2018-08-01 02:00:00
从日期开始,我想创建新列“00_hrs”、“01_hrs”、“02_hrs”(等等,直到“23_hrs”)其中值为 0 或 1。当给定日期时间的小时适用于列中给定的小时时为 1,否则为 0。
结果应如下所示:
Date 00_hrs 01_hrs 02_hrs ... 23_hrs
0 2018-08-01 00:00:00 1 0 0 0
1 2018-08-01 01:00:00 0 1 0 0
2 2018-08-01 02:00:00 0 0 1 0
使用get_dummies
with hours generated by Series.dt.strftime
and then add to original by DataFrame.join
:
df = df.join(pd.get_dummies(df['Date'].dt.strftime('%H_hrs')))
print (df)
Date 00_hrs 01_hrs 02_hrs
0 2018-08-01 00:00:00 1 0 0
1 2018-08-01 01:00:00 0 1 0
2 2018-08-01 02:00:00 0 0 1
如果可能的话,可以通过 DataFrame.reindex
:
hours = [f'{n:02}_hrs' for n in range(24)]
df = (df.join(pd.get_dummies(df['Date'].dt.strftime('%H_hrs'))
.reindex(hours, axis=1, fill_value=0)))
print (df)
Date 00_hrs 01_hrs 02_hrs 03_hrs 04_hrs 05_hrs 06_hrs \
0 2018-08-01 00:00:00 1 0 0 0 0 0 0
1 2018-08-01 01:00:00 0 1 0 0 0 0 0
2 2018-08-01 02:00:00 0 0 1 0 0 0 0
07_hrs 08_hrs 09_hrs 10_hrs 11_hrs 12_hrs 13_hrs 14_hrs 15_hrs \
0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
16_hrs 17_hrs 18_hrs 19_hrs 20_hrs 21_hrs 22_hrs 23_hrs
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0