在遍历期间更新 Dataframe
Updating Dataframe during Traversal
我正在处理数据框,在遍历它们时需要删除几行。
简要概述:我读了一行(N),将其与接下来的20行(直到N+20)进行比较,并根据比较删除N和N+20之间的几行。然后我回到 N+1,并将该行与接下来的 20 行进行比较,直到 N+1+20。 我不想将 N+1 与我之前删除的行进行比较。
但是,当我删除行时,删除没有反映在数据框中,因为我正在遍历其原始副本,并且更改没有反映出来。
对此有什么解决方案吗?
df = pd.read_csv(r"C:\snip\test.csv")
index_to_delete = []
for index, row in df.iterrows():
snip
for i in range(20):
if (index + i + 1) < len(df.index):
if condition:
index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20
df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
df = df.drop(index_to_delete)
index_to_delete.clear()
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
有很多技巧可以解决问题:
1:您可以对 df
的 len 进行迭代,而不是对 df
.
进行迭代
for inx in range(len(df)):
try:
row = df.loc[inx]
except:
continue
2: 存储已检查的索引并跳过它们
df = pd.read_csv(r"C:\snip\test.csv")
all_index_to_delete = []
index_to_delete = []
for index, row in df.iterrows():
if index in all_index_to_delete:
continue
snip
for i in range(20):
if (index + i + 1) < len(df.index):
if condition:
index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20
all_index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20
df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
df = df.drop(index_to_delete)
index_to_delete.clear()
我正在处理数据框,在遍历它们时需要删除几行。
简要概述:我读了一行(N),将其与接下来的20行(直到N+20)进行比较,并根据比较删除N和N+20之间的几行。然后我回到 N+1,并将该行与接下来的 20 行进行比较,直到 N+1+20。 我不想将 N+1 与我之前删除的行进行比较。
但是,当我删除行时,删除没有反映在数据框中,因为我正在遍历其原始副本,并且更改没有反映出来。 对此有什么解决方案吗?
df = pd.read_csv(r"C:\snip\test.csv")
index_to_delete = []
for index, row in df.iterrows():
snip
for i in range(20):
if (index + i + 1) < len(df.index):
if condition:
index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20
df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
df = df.drop(index_to_delete)
index_to_delete.clear()
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
有很多技巧可以解决问题:
1:您可以对 df
的 len 进行迭代,而不是对 df
.
for inx in range(len(df)):
try:
row = df.loc[inx]
except:
continue
2: 存储已检查的索引并跳过它们
df = pd.read_csv(r"C:\snip\test.csv")
all_index_to_delete = []
index_to_delete = []
for index, row in df.iterrows():
if index in all_index_to_delete:
continue
snip
for i in range(20):
if (index + i + 1) < len(df.index):
if condition:
index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20
all_index_to_delete.append(index + i + 1) #storing indices of rows to delete between N and N+20
df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
df = df.drop(index_to_delete)
index_to_delete.clear()