Pandas 根据条件计算新列

Question

这是我的 df:

df = pd.DataFrame({'date':['2020-01-01 12:00:00','2020-01-01 15:00:00','2020-01-06 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
                    'numb_total':[8,25,11,14,8]})
df['date'] = pd.to_datetime(df['date'])

给我：

                   date numb_total
0   2020-01-01 12:00:00          8
1   2020-01-01 15:00:00         25
2   2020-01-06 07:00:00         11
3   2020-01-15 13:00:00         14
4   2020-01-22 12:00:00          8

现在我想添加一个新列，在特殊条件下给我 numb_total * x (x=5)，否则 *y (y=10)。

条件：如果 date 是“星期一”或“星期二”并且 time 的日期在 08:00 - 14:00 之间：

df['numb_new'] = df['numb_total']*x

其他:

df['numb_new'] = df['numb_total']*y

为了获得 day_name 和 time 我这样做了：

df['day'] = df['date'].dt.day_name()
df['time'] = df['date'].dt.time

我如何创建这个新列 df['numb_new'] 高效？

Answer 1

IIUC，你可以这样做：

import pandas as pd
import numpy as np

df = pd.DataFrame({'date': ['2020-01-01 12:00:00', '2020-01-01 15:00:00', '2020-01-06 07:00:00',
                            '2020-01-15 13:00:00', '2020-01-22 12:00:00'],
                   'numb_total': [8, 25, 11, 14, 8]})
df['date'] = pd.to_datetime(df['date'])

hour_mask = (8 <= df['date'].dt.hour) & (df['date'].dt.hour <= 14)

# for weekday Monday is 0 and Tuesday 1
day_mask = np.isin(df['date'].dt.weekday, [0, 1])

df['numb_new'] = df['numb_total'] * np.where(hour_mask & day_mask,  5, 10)

print(df)

输出

                 date  numb_total  numb_new
0 2020-01-01 12:00:00           8        80
1 2020-01-01 15:00:00          25       250
2 2020-01-06 07:00:00          11       110
3 2020-01-15 13:00:00          14       140
4 2020-01-22 12:00:00           8        80

Answer 2

def create_col(date_col,value_col,x,y,min_hour,max_hour,days):
   #assert that a column is datatime
   assert ptypes.is_datetime64_any_dtype(date_col)
   #initiate the condition of hours
   hour_cond = (min_hour <= date_col.dt.hour) & (date_col.dt.hour <= max_hour)
   #initiate the condition of days
   day_mask = np.isin(date_col.dt.weekday, days)
   #return the condionned array 
   return value_col * np.where(hour_cond & day_mask,  x, y)

df = pd.DataFrame({'date': ['2020-01-01 12:00:00', '2020-01-01 15:00:00',
'2020-01-06 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
               'numb_total': [8, 25, 11, 14, 8]})
df['date'] = pd.to_datetime(df['date'])
df["numb_new"] = create_col(df['date'] ,df['numb_total'],5,10,8,14,[0,1])
print(df)

这是一个更通用的答案，您可以根据传递的参数更改输出。

Pandas 根据条件计算新列

Pandas calculating a new column under condition

python

apply

dataframe

python-3.x

pandas