使用 Python 删除 Panda 中的特定行

Removing specific rows in Panda with Python

我在 csv 文件中有以下示例。我想检查以下规则:

1- 首先检查名称中有“-”破折号的所有 parents 然后:

2- 如果我们在所有 CSV 文件中搜索 (rule1) 的 df["Parent"] 就像 df["Child"] : 对于此样本 Saher-1 在第 1 行中是 Parent 并且 Child 在第 2 行中,如果 df["Child"] 类似于 df["Parent"],我想删除 Child 行。这里 MoriChild 在第 1 行也是 Parent 在行 2.

我不知道如何在此示例中使用 if 子句。

我有以下文件:

Parent      Child
Saher-1      Mori
Mori        Saher-1
John        Jake
Saher-2     Mary

我的期望:

Parent        Child
Saher-1       Mori
John          Jake
Saher-2       Mary

这个答案要求父元素总是用“-”或其他标记来标记,否则就会变得不明确谁是父元素,谁是子元素。其他人也许能给出更好的答案。

import pandas as pd

data = {"Parent": ["Saher-1", "Mori", "John"], "Child": ["Mori", "Saher-1", "Jake"]}

df = pd.DataFrame(data=data)

# This checks for Parents with dashes on the Child column
df_temp = df[df["Child"].str.contains("-")] 

# This use concat to join both dataframes, removing duplicates
# Essentially, it removes the df_temp from df
df = pd.concat([df, df_temp]).drop_duplicates(keep=False)

print(df)
>>>    Parent Child
>>> 0  Saher-1  Mori
>>> 2  John  Jake
df = pd.DataFrame({
    'Parent':['Saher-1','Mori','John','Saher-2'],
    'Child':['Mori','Saher-1','Jake','Mary']
})
# print(df)
  Parent    Child
0  Saher-1     Mori
1     Mori  Saher-1
2     John     Jake
3  Saher-2     Mary

#check rows where parent-child have same names
df_dup = df[df['Child'].isin(df['Parent'].tolist())]
print(df_dup)
 Parent    Child
0  Saher-1     Mori
1     Mori  Saher-1

#remove them from main DF
df=pd.concat([df,df_dup]).drop_duplicates(keep=False)
print(df)
   Parent Child
2     John  Jake
3  Saher-2  Mary

#leave only parent with '-' in dubplaceted df
df_dup = df_dup[df_dup['Parent'].str.contains('-')]
print(df_dup)
Parent Child
0  Saher-1  Mori

#create final df
df=pd.concat([df,df_dup]).reset_index(drop=True)
print(df)
0     John  Jake
1  Saher-2  Mary
2  Saher-1  Mori

让我们尝试使用可选参数 indicator=True 向左 merge 来识别满足指定规则的行:

m = df['Parent'].str.contains('-')
d = df.merge(df[m], left_on=['Parent', 'Child'], right_on=['Child', 'Parent'], how='left', indicator=True)
d = df[d['_merge'].eq('left_only')]

print(d)
    Parent Child
0  Saher-1  Mori
2     John  Jake
3  Saher-2  Mary