在遍历期间更新 Dataframe

Updating Dataframe during Traversal

我正在处理数据框,在遍历它们时需要删除几行。

简要概述:我读了一行(N),将其与接下来的20行(直到N+20)进行比较,并根据比较删除N和N+20之间的几行。然后我回到 N+1,并将该行与接下来的 20 行进行比较,直到 N+1+20。 我不想将 N+1 与我之前删除的行进行比较。

但是,当我删除行时,删除没有反映在数据框中,因为我正在遍历其原始副本,并且更改没有反映出来。 对此有什么解决方案吗?

df = pd.read_csv(r"C:\snip\test.csv")
index_to_delete = []

for index, row in df.iterrows():
    snip

    for i in range(20):
        if (index + i + 1) < len(df.index):
            if condition:
                index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20

    df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
    df = df.drop(index_to_delete)
    index_to_delete.clear()

pandas.DataFrame.iterrows():

You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

有很多技巧可以解决问题:

1:您可以对 df 的 len 进行迭代,而不是对 df.

进行迭代
for inx in range(len(df)):
    try:
        row = df.loc[inx]
    except:
        continue

2: 存储已检查的索引并跳过它们

df = pd.read_csv(r"C:\snip\test.csv")
all_index_to_delete = []
index_to_delete = []

for index, row in df.iterrows():
    if index in all_index_to_delete:
        continue
    snip

    for i in range(20):
        if (index + i + 1) < len(df.index):
            if condition:
                index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20
                all_index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20

    df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
    df = df.drop(index_to_delete)
    index_to_delete.clear()