使用 Python 删除 Panda 中的特定行
Removing specific rows in Panda with Python
我在 csv 文件中有以下示例。我想检查以下规则:
1- 首先检查名称中有“-”破折号的所有 parents 然后:
2- 如果我们在所有 CSV 文件中搜索 (rule1) 的 df["Parent"] 就像 df["Child"] :
对于此样本 Saher-1 在第 1 行中是 Parent 并且 Child 在第 2 行中,如果 df["Child"] 类似于 df["Parent"],我想删除 Child 行。这里 Mori 是 Child 在第 1 行也是 Parent 在行 2.
我不知道如何在此示例中使用 if 子句。
我有以下文件:
Parent Child
Saher-1 Mori
Mori Saher-1
John Jake
Saher-2 Mary
我的期望:
Parent Child
Saher-1 Mori
John Jake
Saher-2 Mary
这个答案要求父元素总是用“-”或其他标记来标记,否则就会变得不明确谁是父元素,谁是子元素。其他人也许能给出更好的答案。
import pandas as pd
data = {"Parent": ["Saher-1", "Mori", "John"], "Child": ["Mori", "Saher-1", "Jake"]}
df = pd.DataFrame(data=data)
# This checks for Parents with dashes on the Child column
df_temp = df[df["Child"].str.contains("-")]
# This use concat to join both dataframes, removing duplicates
# Essentially, it removes the df_temp from df
df = pd.concat([df, df_temp]).drop_duplicates(keep=False)
print(df)
>>> Parent Child
>>> 0 Saher-1 Mori
>>> 2 John Jake
df = pd.DataFrame({
'Parent':['Saher-1','Mori','John','Saher-2'],
'Child':['Mori','Saher-1','Jake','Mary']
})
# print(df)
Parent Child
0 Saher-1 Mori
1 Mori Saher-1
2 John Jake
3 Saher-2 Mary
#check rows where parent-child have same names
df_dup = df[df['Child'].isin(df['Parent'].tolist())]
print(df_dup)
Parent Child
0 Saher-1 Mori
1 Mori Saher-1
#remove them from main DF
df=pd.concat([df,df_dup]).drop_duplicates(keep=False)
print(df)
Parent Child
2 John Jake
3 Saher-2 Mary
#leave only parent with '-' in dubplaceted df
df_dup = df_dup[df_dup['Parent'].str.contains('-')]
print(df_dup)
Parent Child
0 Saher-1 Mori
#create final df
df=pd.concat([df,df_dup]).reset_index(drop=True)
print(df)
0 John Jake
1 Saher-2 Mary
2 Saher-1 Mori
让我们尝试使用可选参数 indicator=True
向左 merge
来识别满足指定规则的行:
m = df['Parent'].str.contains('-')
d = df.merge(df[m], left_on=['Parent', 'Child'], right_on=['Child', 'Parent'], how='left', indicator=True)
d = df[d['_merge'].eq('left_only')]
print(d)
Parent Child
0 Saher-1 Mori
2 John Jake
3 Saher-2 Mary
我在 csv 文件中有以下示例。我想检查以下规则:
1- 首先检查名称中有“-”破折号的所有 parents 然后:
2- 如果我们在所有 CSV 文件中搜索 (rule1) 的 df["Parent"] 就像 df["Child"] : 对于此样本 Saher-1 在第 1 行中是 Parent 并且 Child 在第 2 行中,如果 df["Child"] 类似于 df["Parent"],我想删除 Child 行。这里 Mori 是 Child 在第 1 行也是 Parent 在行 2.
我不知道如何在此示例中使用 if 子句。
我有以下文件:
Parent Child
Saher-1 Mori
Mori Saher-1
John Jake
Saher-2 Mary
我的期望:
Parent Child
Saher-1 Mori
John Jake
Saher-2 Mary
这个答案要求父元素总是用“-”或其他标记来标记,否则就会变得不明确谁是父元素,谁是子元素。其他人也许能给出更好的答案。
import pandas as pd
data = {"Parent": ["Saher-1", "Mori", "John"], "Child": ["Mori", "Saher-1", "Jake"]}
df = pd.DataFrame(data=data)
# This checks for Parents with dashes on the Child column
df_temp = df[df["Child"].str.contains("-")]
# This use concat to join both dataframes, removing duplicates
# Essentially, it removes the df_temp from df
df = pd.concat([df, df_temp]).drop_duplicates(keep=False)
print(df)
>>> Parent Child
>>> 0 Saher-1 Mori
>>> 2 John Jake
df = pd.DataFrame({
'Parent':['Saher-1','Mori','John','Saher-2'],
'Child':['Mori','Saher-1','Jake','Mary']
})
# print(df)
Parent Child
0 Saher-1 Mori
1 Mori Saher-1
2 John Jake
3 Saher-2 Mary
#check rows where parent-child have same names
df_dup = df[df['Child'].isin(df['Parent'].tolist())]
print(df_dup)
Parent Child
0 Saher-1 Mori
1 Mori Saher-1
#remove them from main DF
df=pd.concat([df,df_dup]).drop_duplicates(keep=False)
print(df)
Parent Child
2 John Jake
3 Saher-2 Mary
#leave only parent with '-' in dubplaceted df
df_dup = df_dup[df_dup['Parent'].str.contains('-')]
print(df_dup)
Parent Child
0 Saher-1 Mori
#create final df
df=pd.concat([df,df_dup]).reset_index(drop=True)
print(df)
0 John Jake
1 Saher-2 Mary
2 Saher-1 Mori
让我们尝试使用可选参数 indicator=True
向左 merge
来识别满足指定规则的行:
m = df['Parent'].str.contains('-')
d = df.merge(df[m], left_on=['Parent', 'Child'], right_on=['Child', 'Parent'], how='left', indicator=True)
d = df[d['_merge'].eq('left_only')]
print(d)
Parent Child
0 Saher-1 Mori
2 John Jake
3 Saher-2 Mary