如何使用 Python 中的两个字段过滤 Pandas 数据 table?
How to filter Pandas data table using two fields in Python?
我有以下 pandas 数据 table :
File_name River Confidance X Y W H T_Area Overlap_Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
10 test4.png BRIDGING 0.133226 1385 1049 1463 1084 100 1645.0
我想在 pandas 中使用 groupby
查找 "T_Area" == 0
或 "T_Area" / "Overlap_Area" > 0.5
的记录。
项目 10 应该在输出中删除,因为 100 / 1645 < 0.5
df[df['T Area'].eq(0) | df['T Area'].div(df['Overlap Area']).gt(0.5)]
输出:
File_name River Confidance X Y W H T_Area Overlap_Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
您可以使用:
m1 = df['T Area'] == 0
m2 = df['T Area'] / df['Overlap Area'] > 0.5
out = df[m1 | m2]
print(out)
# Output
File name River Confidance X Y W H T Area Overlap Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
更新
如果要从至少有一行违反条件的组(文件名)中删除所有行,请使用 groupby_transform
:
out = df[(m1 | m2).groupby(df['File name']).transform(min)]
print(out)
# Output
File name River Confidance X Y W H T Area Overlap Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
希望对您有所帮助:
df.groupby(['File_name']).apply(lambda x: x[(x['T_Area']/x['Overlap_Area']>0.5) | (x['T_Area']==0)])
我有以下 pandas 数据 table :
File_name River Confidance X Y W H T_Area Overlap_Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
10 test4.png BRIDGING 0.133226 1385 1049 1463 1084 100 1645.0
我想在 pandas 中使用 groupby
查找 "T_Area" == 0
或 "T_Area" / "Overlap_Area" > 0.5
的记录。
项目 10 应该在输出中删除,因为 100 / 1645 < 0.5
df[df['T Area'].eq(0) | df['T Area'].div(df['Overlap Area']).gt(0.5)]
输出:
File_name River Confidance X Y W H T_Area Overlap_Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
您可以使用:
m1 = df['T Area'] == 0
m2 = df['T Area'] / df['Overlap Area'] > 0.5
out = df[m1 | m2]
print(out)
# Output
File name River Confidance X Y W H T Area Overlap Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
7 test4.png BRIDGING 0.594574 147 1390 224 1456 0 0.0
8 test4.png BRIDGING 0.149411 150 1701 221 1732 0 0.0
9 test4.png BRIDGING 0.145715 1385 1245 1462 1279 0 0.0
更新
如果要从至少有一行违反条件的组(文件名)中删除所有行,请使用 groupby_transform
:
out = df[(m1 | m2).groupby(df['File name']).transform(min)]
print(out)
# Output
File name River Confidance X Y W H T Area Overlap Area
0 test1.png BRIDGING 0.587851 739 821 769 894 0 0.0
1 test1.png BRIDGING 0.579243 980 286 1018 361 0 0.0
2 test1.png BRIDGING 0.534472 966 935 1038 973 1406 296.0
3 test1.png BRIDGING 0.530194 275 859 313 934 0 0.0
4 test1.png BRIDGING 0.368075 944 516 976 589 0 0.0
5 test1.png BRIDGING 0.132732 929 814 1000 856 1640 1240.0
6 test2.png BRIDGING 0.748589 886 1199 963 1248 0 0.0
希望对您有所帮助:
df.groupby(['File_name']).apply(lambda x: x[(x['T_Area']/x['Overlap_Area']>0.5) | (x['T_Area']==0)])