Pandas:将条件为if/else的for循环转换为apply方法(lambda函数)
Pandas: convert for loop with if/else conditions into apply method (lambda function)
我有以下带有 for 循环的函数:
def add_CQI_iterrows(df):
previous_row = df['Date'].astype(str)[0]
CQI_index = 0
series = []
for index, row in df.iterrows():
if row['Date'] == previous_row:
previous_row = row['Date']
print(CQI_index)
else:
CQI_index += 1
previous_row = row['Date']
series.append(CQI_index)
df['CQI'] = series
return df
我想找到一种方法将这个 for 循环转换为 apply 方法。像这样的东西(不起作用):
def add_CQI_apply(df):
previous_row = df['Date'].astype(str)[0]
CQI_index = 1
series = []
df['CQI'] = df.apply(lambda row: previous_row = row['Date'] if row['Date'] == previous_row else CQI_index += 1 and previous_row = row['Date'], axis=1)
return df
我想做这个转换,因为我想看看 apply 方法有多快,以及是否可以对 Pandas 系列的 apply 方法进行矢量化。
这是我的数据 (data.json):
[
{
"Date": "9/20/2020 8:50",
"UE": 1
},
{
"Date": "9/20/2020 8:50",
"UE": 2
},
{
"Date": "9/20/2020 8:50",
"UE": 3
},
{
"Date": "9/20/2020 8:57",
"UE": 1
},
{
"Date": "9/20/2020 8:57",
"UE": 8
},
{
"Date": "9/20/2020 8:57",
"UE": 2
},
{
"Date": "9/20/2020 9:12",
"UE": 1
},
{
"Date": "9/20/2020 9:12",
"UE": 5
},
{
"Date": "9/20/2020 9:12",
"UE": 3
},
{
"Date": "9/20/2020 9:20",
"UE": 1
},
{
"Date": "9/20/2020 9:20",
"UE": 4
},
{
"Date": "9/20/2020 9:20",
"UE": 3
}
]
最后是上传此数据的函数:
def upload_data(file):
df = pd.read_json(file)
df['Date'] = pd.to_datetime(df['Date'], format="%Y-%d-%m %H:%M:%S")
df['CQI'] = np.nan
return df
df['CQI'] = (df['Date'] != df['Date'].shift()).cumsum()
In [120]: (df['Date'] != df['Date'].shift()).cumsum()
Out[120]:
0 1
1 1
2 1
3 2
4 2
5 2
6 3
7 3
8 3
9 4
10 4
11 4
Name: Date, dtype: int64
我有以下带有 for 循环的函数:
def add_CQI_iterrows(df):
previous_row = df['Date'].astype(str)[0]
CQI_index = 0
series = []
for index, row in df.iterrows():
if row['Date'] == previous_row:
previous_row = row['Date']
print(CQI_index)
else:
CQI_index += 1
previous_row = row['Date']
series.append(CQI_index)
df['CQI'] = series
return df
我想找到一种方法将这个 for 循环转换为 apply 方法。像这样的东西(不起作用):
def add_CQI_apply(df):
previous_row = df['Date'].astype(str)[0]
CQI_index = 1
series = []
df['CQI'] = df.apply(lambda row: previous_row = row['Date'] if row['Date'] == previous_row else CQI_index += 1 and previous_row = row['Date'], axis=1)
return df
我想做这个转换,因为我想看看 apply 方法有多快,以及是否可以对 Pandas 系列的 apply 方法进行矢量化。
这是我的数据 (data.json):
[
{
"Date": "9/20/2020 8:50",
"UE": 1
},
{
"Date": "9/20/2020 8:50",
"UE": 2
},
{
"Date": "9/20/2020 8:50",
"UE": 3
},
{
"Date": "9/20/2020 8:57",
"UE": 1
},
{
"Date": "9/20/2020 8:57",
"UE": 8
},
{
"Date": "9/20/2020 8:57",
"UE": 2
},
{
"Date": "9/20/2020 9:12",
"UE": 1
},
{
"Date": "9/20/2020 9:12",
"UE": 5
},
{
"Date": "9/20/2020 9:12",
"UE": 3
},
{
"Date": "9/20/2020 9:20",
"UE": 1
},
{
"Date": "9/20/2020 9:20",
"UE": 4
},
{
"Date": "9/20/2020 9:20",
"UE": 3
}
]
最后是上传此数据的函数:
def upload_data(file):
df = pd.read_json(file)
df['Date'] = pd.to_datetime(df['Date'], format="%Y-%d-%m %H:%M:%S")
df['CQI'] = np.nan
return df
df['CQI'] = (df['Date'] != df['Date'].shift()).cumsum()
In [120]: (df['Date'] != df['Date'].shift()).cumsum()
Out[120]:
0 1
1 1
2 1
3 2
4 2
5 2
6 3
7 3
8 3
9 4
10 4
11 4
Name: Date, dtype: int64