Python Dataframe - 无法理解和解码错误

Python Dataframe - Trouble understanding and decoding the error

我对 python 完全陌生。我想根据实际和预计到达日期和时间创建一个名为 Arrival Delay 的新列。我正在尝试使用 Pandas Dataframe 进行此操作。我试过的代码如下。

for i in range(0,df_new.shape[0]):
    if df_new["ACT_ARRIVAL_DATE"][i] == df_new["ARRIVAL_ETA_DATE"][i]:
        if df_new["ACT_ARRIVAL_TIME"][i] > df_new["ARRIVAL_ETA_TIME"][i]:
            df_new['Arrival Delay'][i] = df_new["ACT_ARRIVAL_TIME"][i] - 
            df_new["ARRIVAL_ETA_TIME"][i]
        else:
            df_new['Arrival Delay'][i] = 0
    elif df_new["ACT_ARRIVAL_DATE"][i] > df_new["ARRIVAL_ETA_DATE"][i]:
        if df_new["ACT_ARRIVAL_TIME"][i] > df_new["ARRIVAL_ETA_TIME"[i]:
            df_new['Arrival Delay'][i] = 24 + (df_new["ACT_ARRIVAL_TIME"][i] - df_new["ARRIVAL_ETA_TIME"][i])
    else:
        df_new['Arrival Delay'][i] = 24

但是我收到以下错误。

ValueError                                Traceback (most recent call last)
<ipython-input-60-8dfb865ac5c2> in <module>()
  1 for i in range(0,df_new.shape[0]):
----> 2     if df_new["ACT_ARRIVAL_DATE"][i] == df_new["ARRIVAL_ETA_DATE"][i]:
  3         if df_new[ACT_ARRIVAL_TIME[i]] > df_new[ARRIVAL_ETA_TIME[i]]:
  4             df_new['Arrival Delay'] = df_new[ACT_ARRIVAL_TIME[i]] - df_new[ARRIVAL_ETA_TIME[i]]
  5         else:

C:\Users16205\AppData\Local\Continuum\Anaconda3\lib\site-
packages\pandas\core\generic.py in __nonzero__(self)
951         raise ValueError("The truth value of a {0} is ambiguous. "
952                          "Use a.empty, a.bool(), a.item(), a.any() or 
a.all()."
--> 953                          .format(self.__class__.__name__))
954 
955     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), 
a.item(), a.any() or a.all().

请帮帮我。

注意:变量的格式为datetime64[ns]

这样的行

df_new["ACT_ARRIVAL_DATE"][i]

需要这样写

df_new.loc[i,"ACT_ARRIVAL_DATE"]

您不需要使用 for 循环,但是 pandas for 循环看起来像这样

for index,row in df_new.iterrows():
    if row["ACT_ARRIVAL_DATE"] == row["ARRIVAL_ETA_DATE"]:
        if row["ACT_ARRIVAL_TIME"] > row["ARRIVAL_ETA_TIME"]:
            df_new.loc[index,'Arrival Delay'] = row["ACT_ARRIVAL_TIME"] - 
            row["ARRIVAL_ETA_TIME"]
        else:

为了避免 for 循环,你可以做一些布尔索引

df_new.loc[(df_new.ACT_ARRIVAL_DATE == df.ARRIVAL_ETA_DATE) & (df_new.ACT_ARRIVAL_TIME > df_new.ARRIVAL_ETA_TIME),'Arrival Delay'] = df_new.ACT_ARRIVAL_TIME - df_new.ARRIVAL_ETA_TIME

并为其余的可能性构建它

考虑一个类似于 R 的 ifelse()

的嵌套 np.where()
df_new["Arrival Delay"] = np.where((df_new["ACT_ARRIVAL_DATE"] == df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] > df_new["ARRIVAL_ETA_TIME"]), 
                                    df_new["ACT_ARRIVAL_TIME"] - df_new["ARRIVAL_ETA_TIME"], 

                                    np.where((df_new["ACT_ARRIVAL_DATE"] == df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] <= df_new["ARRIVAL_ETA_TIME"]), 0, 

                                             np.where((df_new["ACT_ARRIVAL_DATE"] > df_new["ARRIVAL_ETA_DATE"]) & (df_new["ACT_ARRIVAL_TIME"] > df_new["ARRIVAL_ETA_TIME"]), 
                                                      24 + df_new["ACT_ARRIVAL_TIME"] - df_new["ARRIVAL_ETA_TIME"], 24)))