Python:将 pandas 数据框中的系列对象列转换为 int64 dtype
Python: convert series object columns in pandas dataframe to int64 dtype
我有以下数据框:
Day_Part Start_Time End_Time
Breakfast 9:00 11:00
Lunch 12:00 14:00
Dinner 19:00 23:00
Start_Time 和 End_time 列现在是 'Series Objects'。我想将这些列中的值转换为 int64 dtype。
这是我希望数据框的样子:
Day_Part Start_Time End_Time
Breakfast 9 11
Lunch 12 14
Dinner 19 23
*非常感谢任何帮助。
可以先转换to_timedelta
再提取hour
:
df['Start_Time'] = pd.to_timedelta(df['Start_Time']+ ':00').dt.components.hours
df['End_Time'] = pd.to_timedelta(df['End_Time']+ ':00').dt.components.hours
print (df)
Day_Part Start_Time End_Time
0 Breakfast 9 11
1 Lunch 12 14
2 Dinner 19 23
另一种解决方案 split
并转换为 int
:
df['Start_Time'] = df['Start_Time'].str.split(':').str[0].astype(int)
df['End_Time'] = df['End_Time'].str.split(':').str[0].astype(int)
print (df)
Day_Part Start_Time End_Time
0 Breakfast 9 11
1 Lunch 12 14
2 Dinner 19 23
解决方案 extract
并转换为 int
:
df['Start_Time'] = df['Start_Time'].str.extract('(\d*):', expand=False).astype(int)
df['End_Time'] = df['End_Time'].str.extract('(\d*):', expand=False).astype(int)
print (df)
Day_Part Start_Time End_Time
0 Breakfast 9 11
1 Lunch 12 14
2 Dinner 19 23
转换解决方案 to_datetime
:
df['Start_Time'] = pd.to_datetime(df['Start_Time'], format='%H:%M').dt.hour
df['End_Time'] = pd.to_datetime(df['End_Time'], format='%H:%M').dt.hour
print (df)
Day_Part Start_Time End_Time
0 Breakfast 9 11
1 Lunch 12 14
2 Dinner 19 23
时间:
#[300000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
print (df)
In [158]: %timeit pd.to_timedelta(df['Start_Time']+ ':00').dt.components.hours
1 loop, best of 3: 7.12 s per loop
In [159]: %timeit df['Start_Time'].str.split(':').str[0].astype(int)
1 loop, best of 3: 415 ms per loop
In [160]: %timeit df['Start_Time'].str.extract('(\d*):', expand=False).astype(int)
1 loop, best of 3: 654 ms per loop
In [166]: %timeit pd.to_datetime(df['Start_Time'], format='%H:%M').dt.hour
1 loop, best of 3: 1.26 s per loop
我有以下数据框:
Day_Part Start_Time End_Time
Breakfast 9:00 11:00
Lunch 12:00 14:00
Dinner 19:00 23:00
Start_Time 和 End_time 列现在是 'Series Objects'。我想将这些列中的值转换为 int64 dtype。
这是我希望数据框的样子:
Day_Part Start_Time End_Time
Breakfast 9 11
Lunch 12 14
Dinner 19 23
*非常感谢任何帮助。
可以先转换to_timedelta
再提取hour
:
df['Start_Time'] = pd.to_timedelta(df['Start_Time']+ ':00').dt.components.hours
df['End_Time'] = pd.to_timedelta(df['End_Time']+ ':00').dt.components.hours
print (df)
Day_Part Start_Time End_Time
0 Breakfast 9 11
1 Lunch 12 14
2 Dinner 19 23
另一种解决方案 split
并转换为 int
:
df['Start_Time'] = df['Start_Time'].str.split(':').str[0].astype(int)
df['End_Time'] = df['End_Time'].str.split(':').str[0].astype(int)
print (df)
Day_Part Start_Time End_Time
0 Breakfast 9 11
1 Lunch 12 14
2 Dinner 19 23
解决方案 extract
并转换为 int
:
df['Start_Time'] = df['Start_Time'].str.extract('(\d*):', expand=False).astype(int)
df['End_Time'] = df['End_Time'].str.extract('(\d*):', expand=False).astype(int)
print (df)
Day_Part Start_Time End_Time
0 Breakfast 9 11
1 Lunch 12 14
2 Dinner 19 23
转换解决方案 to_datetime
:
df['Start_Time'] = pd.to_datetime(df['Start_Time'], format='%H:%M').dt.hour
df['End_Time'] = pd.to_datetime(df['End_Time'], format='%H:%M').dt.hour
print (df)
Day_Part Start_Time End_Time
0 Breakfast 9 11
1 Lunch 12 14
2 Dinner 19 23
时间:
#[300000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
print (df)
In [158]: %timeit pd.to_timedelta(df['Start_Time']+ ':00').dt.components.hours
1 loop, best of 3: 7.12 s per loop
In [159]: %timeit df['Start_Time'].str.split(':').str[0].astype(int)
1 loop, best of 3: 415 ms per loop
In [160]: %timeit df['Start_Time'].str.extract('(\d*):', expand=False).astype(int)
1 loop, best of 3: 654 ms per loop
In [166]: %timeit pd.to_datetime(df['Start_Time'], format='%H:%M').dt.hour
1 loop, best of 3: 1.26 s per loop