Python Dataframe - 无法理解和解码错误
Python Dataframe - Trouble understanding and decoding the error
我对 python 完全陌生。我想根据实际和预计到达日期和时间创建一个名为 Arrival Delay 的新列。我正在尝试使用 Pandas Dataframe 进行此操作。我试过的代码如下。
for i in range(0,df_new.shape[0]):
if df_new["ACT_ARRIVAL_DATE"][i] == df_new["ARRIVAL_ETA_DATE"][i]:
if df_new["ACT_ARRIVAL_TIME"][i] > df_new["ARRIVAL_ETA_TIME"][i]:
df_new['Arrival Delay'][i] = df_new["ACT_ARRIVAL_TIME"][i] -
df_new["ARRIVAL_ETA_TIME"][i]
else:
df_new['Arrival Delay'][i] = 0
elif df_new["ACT_ARRIVAL_DATE"][i] > df_new["ARRIVAL_ETA_DATE"][i]:
if df_new["ACT_ARRIVAL_TIME"][i] > df_new["ARRIVAL_ETA_TIME"[i]:
df_new['Arrival Delay'][i] = 24 + (df_new["ACT_ARRIVAL_TIME"][i] - df_new["ARRIVAL_ETA_TIME"][i])
else:
df_new['Arrival Delay'][i] = 24
但是我收到以下错误。
ValueError Traceback (most recent call last)
<ipython-input-60-8dfb865ac5c2> in <module>()
1 for i in range(0,df_new.shape[0]):
----> 2 if df_new["ACT_ARRIVAL_DATE"][i] == df_new["ARRIVAL_ETA_DATE"][i]:
3 if df_new[ACT_ARRIVAL_TIME[i]] > df_new[ARRIVAL_ETA_TIME[i]]:
4 df_new['Arrival Delay'] = df_new[ACT_ARRIVAL_TIME[i]] - df_new[ARRIVAL_ETA_TIME[i]]
5 else:
C:\Users16205\AppData\Local\Continuum\Anaconda3\lib\site-
packages\pandas\core\generic.py in __nonzero__(self)
951 raise ValueError("The truth value of a {0} is ambiguous. "
952 "Use a.empty, a.bool(), a.item(), a.any() or
a.all()."
--> 953 .format(self.__class__.__name__))
954
955 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(),
a.item(), a.any() or a.all().
请帮帮我。
注意:变量的格式为datetime64[ns]
这样的行
df_new["ACT_ARRIVAL_DATE"][i]
需要这样写
df_new.loc[i,"ACT_ARRIVAL_DATE"]
您不需要使用 for 循环,但是 pandas for 循环看起来像这样
for index,row in df_new.iterrows():
if row["ACT_ARRIVAL_DATE"] == row["ARRIVAL_ETA_DATE"]:
if row["ACT_ARRIVAL_TIME"] > row["ARRIVAL_ETA_TIME"]:
df_new.loc[index,'Arrival Delay'] = row["ACT_ARRIVAL_TIME"] -
row["ARRIVAL_ETA_TIME"]
else:
为了避免 for 循环,你可以做一些布尔索引
df_new.loc[(df_new.ACT_ARRIVAL_DATE == df.ARRIVAL_ETA_DATE) & (df_new.ACT_ARRIVAL_TIME > df_new.ARRIVAL_ETA_TIME),'Arrival Delay'] = df_new.ACT_ARRIVAL_TIME - df_new.ARRIVAL_ETA_TIME
并为其余的可能性构建它
考虑一个类似于 R 的 ifelse()
的嵌套 np.where()
df_new["Arrival Delay"] = np.where((df_new["ACT_ARRIVAL_DATE"] == df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] > df_new["ARRIVAL_ETA_TIME"]),
df_new["ACT_ARRIVAL_TIME"] - df_new["ARRIVAL_ETA_TIME"],
np.where((df_new["ACT_ARRIVAL_DATE"] == df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] <= df_new["ARRIVAL_ETA_TIME"]), 0,
np.where((df_new["ACT_ARRIVAL_DATE"] > df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] > df_new["ARRIVAL_ETA_TIME"]),
24 + df_new["ACT_ARRIVAL_TIME"] - df_new["ARRIVAL_ETA_TIME"], 24)))
我对 python 完全陌生。我想根据实际和预计到达日期和时间创建一个名为 Arrival Delay 的新列。我正在尝试使用 Pandas Dataframe 进行此操作。我试过的代码如下。
for i in range(0,df_new.shape[0]):
if df_new["ACT_ARRIVAL_DATE"][i] == df_new["ARRIVAL_ETA_DATE"][i]:
if df_new["ACT_ARRIVAL_TIME"][i] > df_new["ARRIVAL_ETA_TIME"][i]:
df_new['Arrival Delay'][i] = df_new["ACT_ARRIVAL_TIME"][i] -
df_new["ARRIVAL_ETA_TIME"][i]
else:
df_new['Arrival Delay'][i] = 0
elif df_new["ACT_ARRIVAL_DATE"][i] > df_new["ARRIVAL_ETA_DATE"][i]:
if df_new["ACT_ARRIVAL_TIME"][i] > df_new["ARRIVAL_ETA_TIME"[i]:
df_new['Arrival Delay'][i] = 24 + (df_new["ACT_ARRIVAL_TIME"][i] - df_new["ARRIVAL_ETA_TIME"][i])
else:
df_new['Arrival Delay'][i] = 24
但是我收到以下错误。
ValueError Traceback (most recent call last)
<ipython-input-60-8dfb865ac5c2> in <module>()
1 for i in range(0,df_new.shape[0]):
----> 2 if df_new["ACT_ARRIVAL_DATE"][i] == df_new["ARRIVAL_ETA_DATE"][i]:
3 if df_new[ACT_ARRIVAL_TIME[i]] > df_new[ARRIVAL_ETA_TIME[i]]:
4 df_new['Arrival Delay'] = df_new[ACT_ARRIVAL_TIME[i]] - df_new[ARRIVAL_ETA_TIME[i]]
5 else:
C:\Users16205\AppData\Local\Continuum\Anaconda3\lib\site-
packages\pandas\core\generic.py in __nonzero__(self)
951 raise ValueError("The truth value of a {0} is ambiguous. "
952 "Use a.empty, a.bool(), a.item(), a.any() or
a.all()."
--> 953 .format(self.__class__.__name__))
954
955 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(),
a.item(), a.any() or a.all().
请帮帮我。
注意:变量的格式为datetime64[ns]
这样的行
df_new["ACT_ARRIVAL_DATE"][i]
需要这样写
df_new.loc[i,"ACT_ARRIVAL_DATE"]
您不需要使用 for 循环,但是 pandas for 循环看起来像这样
for index,row in df_new.iterrows():
if row["ACT_ARRIVAL_DATE"] == row["ARRIVAL_ETA_DATE"]:
if row["ACT_ARRIVAL_TIME"] > row["ARRIVAL_ETA_TIME"]:
df_new.loc[index,'Arrival Delay'] = row["ACT_ARRIVAL_TIME"] -
row["ARRIVAL_ETA_TIME"]
else:
为了避免 for 循环,你可以做一些布尔索引
df_new.loc[(df_new.ACT_ARRIVAL_DATE == df.ARRIVAL_ETA_DATE) & (df_new.ACT_ARRIVAL_TIME > df_new.ARRIVAL_ETA_TIME),'Arrival Delay'] = df_new.ACT_ARRIVAL_TIME - df_new.ARRIVAL_ETA_TIME
并为其余的可能性构建它
考虑一个类似于 R 的 ifelse()
np.where()
df_new["Arrival Delay"] = np.where((df_new["ACT_ARRIVAL_DATE"] == df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] > df_new["ARRIVAL_ETA_TIME"]),
df_new["ACT_ARRIVAL_TIME"] - df_new["ARRIVAL_ETA_TIME"],
np.where((df_new["ACT_ARRIVAL_DATE"] == df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] <= df_new["ARRIVAL_ETA_TIME"]), 0,
np.where((df_new["ACT_ARRIVAL_DATE"] > df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] > df_new["ARRIVAL_ETA_TIME"]),
24 + df_new["ACT_ARRIVAL_TIME"] - df_new["ARRIVAL_ETA_TIME"], 24)))