Pandas 根据条件计算新列
Pandas calculating a new column under condition
这是我的 df:
df = pd.DataFrame({'date':['2020-01-01 12:00:00','2020-01-01 15:00:00','2020-01-06 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
'numb_total':[8,25,11,14,8]})
df['date'] = pd.to_datetime(df['date'])
给我:
date numb_total
0 2020-01-01 12:00:00 8
1 2020-01-01 15:00:00 25
2 2020-01-06 07:00:00 11
3 2020-01-15 13:00:00 14
4 2020-01-22 12:00:00 8
现在我想添加一个新列,在特殊条件下给我 numb_total * x
(x=5
),否则 *y
(y=10
)。
条件:
如果 date
是“星期一”或“星期二”并且 time
的日期在 08:00 - 14:00 之间:
df['numb_new'] = df['numb_total']*x
其他:
df['numb_new'] = df['numb_total']*y
为了获得 day_name
和 time
我这样做了:
df['day'] = df['date'].dt.day_name()
df['time'] = df['date'].dt.time
我如何创建这个新列 df['numb_new']
高效?
IIUC,你可以这样做:
import pandas as pd
import numpy as np
df = pd.DataFrame({'date': ['2020-01-01 12:00:00', '2020-01-01 15:00:00', '2020-01-06 07:00:00',
'2020-01-15 13:00:00', '2020-01-22 12:00:00'],
'numb_total': [8, 25, 11, 14, 8]})
df['date'] = pd.to_datetime(df['date'])
hour_mask = (8 <= df['date'].dt.hour) & (df['date'].dt.hour <= 14)
# for weekday Monday is 0 and Tuesday 1
day_mask = np.isin(df['date'].dt.weekday, [0, 1])
df['numb_new'] = df['numb_total'] * np.where(hour_mask & day_mask, 5, 10)
print(df)
输出
date numb_total numb_new
0 2020-01-01 12:00:00 8 80
1 2020-01-01 15:00:00 25 250
2 2020-01-06 07:00:00 11 110
3 2020-01-15 13:00:00 14 140
4 2020-01-22 12:00:00 8 80
def create_col(date_col,value_col,x,y,min_hour,max_hour,days):
#assert that a column is datatime
assert ptypes.is_datetime64_any_dtype(date_col)
#initiate the condition of hours
hour_cond = (min_hour <= date_col.dt.hour) & (date_col.dt.hour <= max_hour)
#initiate the condition of days
day_mask = np.isin(date_col.dt.weekday, days)
#return the condionned array
return value_col * np.where(hour_cond & day_mask, x, y)
df = pd.DataFrame({'date': ['2020-01-01 12:00:00', '2020-01-01 15:00:00',
'2020-01-06 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
'numb_total': [8, 25, 11, 14, 8]})
df['date'] = pd.to_datetime(df['date'])
df["numb_new"] = create_col(df['date'] ,df['numb_total'],5,10,8,14,[0,1])
print(df)
这是一个更通用的答案,您可以根据传递的参数更改输出。
这是我的 df:
df = pd.DataFrame({'date':['2020-01-01 12:00:00','2020-01-01 15:00:00','2020-01-06 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
'numb_total':[8,25,11,14,8]})
df['date'] = pd.to_datetime(df['date'])
给我:
date numb_total
0 2020-01-01 12:00:00 8
1 2020-01-01 15:00:00 25
2 2020-01-06 07:00:00 11
3 2020-01-15 13:00:00 14
4 2020-01-22 12:00:00 8
现在我想添加一个新列,在特殊条件下给我 numb_total * x
(x=5
),否则 *y
(y=10
)。
条件:
如果 date
是“星期一”或“星期二”并且 time
的日期在 08:00 - 14:00 之间:
df['numb_new'] = df['numb_total']*x
其他:
df['numb_new'] = df['numb_total']*y
为了获得 day_name
和 time
我这样做了:
df['day'] = df['date'].dt.day_name()
df['time'] = df['date'].dt.time
我如何创建这个新列 df['numb_new']
高效?
IIUC,你可以这样做:
import pandas as pd
import numpy as np
df = pd.DataFrame({'date': ['2020-01-01 12:00:00', '2020-01-01 15:00:00', '2020-01-06 07:00:00',
'2020-01-15 13:00:00', '2020-01-22 12:00:00'],
'numb_total': [8, 25, 11, 14, 8]})
df['date'] = pd.to_datetime(df['date'])
hour_mask = (8 <= df['date'].dt.hour) & (df['date'].dt.hour <= 14)
# for weekday Monday is 0 and Tuesday 1
day_mask = np.isin(df['date'].dt.weekday, [0, 1])
df['numb_new'] = df['numb_total'] * np.where(hour_mask & day_mask, 5, 10)
print(df)
输出
date numb_total numb_new
0 2020-01-01 12:00:00 8 80
1 2020-01-01 15:00:00 25 250
2 2020-01-06 07:00:00 11 110
3 2020-01-15 13:00:00 14 140
4 2020-01-22 12:00:00 8 80
def create_col(date_col,value_col,x,y,min_hour,max_hour,days):
#assert that a column is datatime
assert ptypes.is_datetime64_any_dtype(date_col)
#initiate the condition of hours
hour_cond = (min_hour <= date_col.dt.hour) & (date_col.dt.hour <= max_hour)
#initiate the condition of days
day_mask = np.isin(date_col.dt.weekday, days)
#return the condionned array
return value_col * np.where(hour_cond & day_mask, x, y)
df = pd.DataFrame({'date': ['2020-01-01 12:00:00', '2020-01-01 15:00:00',
'2020-01-06 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
'numb_total': [8, 25, 11, 14, 8]})
df['date'] = pd.to_datetime(df['date'])
df["numb_new"] = create_col(df['date'] ,df['numb_total'],5,10,8,14,[0,1])
print(df)
这是一个更通用的答案,您可以根据传递的参数更改输出。