Python :删除具有两个特定条件的数据框行并保留其余行
Python : Drop rows of a dataframe with two specifics conditions and keep the rest
假设我有这个数据框:
import pandas as pd
Name = ['ID', 'Country', 'IBAN','ID_bal_amt', 'ID_bal_time','Dan_city','ID_bal_mod','Dan_country','ID_bal_type', 'ID_bal_amt', 'ID_bal_time','ID_bal_mod','ID_bal_type' ,'Dan_sex', 'Dan_Age', 'Dan_country','Dan_sex' , 'Dan_city','Dan_country','ID_bal_amt', 'ID_bal_time','ID_bal_mod','ID_bal_type' ]
Value = ['TAMARA_CO', 'GERMANY','FR56', '12','June','Berlin','OPBD', '55','CRDT','432', 'August', 'CLBD','DBT', 'M', '22', 'FRA', 'M', 'Madrid', 'ESP','432','March','FABD','CRDT']
Ccy = ['','','','EUR','EUR','','EUR','','','','EUR','EUR','USD','USD','USD','','CHF', '','DKN','','','USD','CHF']
Group = ['0','0','0','1','1','1','1','1','1','2','2','2','2','2','2','2','3','3','3','4','4','4','4']
df = pd.DataFrame({'Name':Name, 'Value' : Value, 'Ccy' : Ccy,'Group':Group})
print(df)
Name Value Ccy Group
0 ID TAMARA_CO 0
1 Country GERMANY 0
2 IBAN FR56 0
3 ID_bal_amt 12 EUR 1
4 ID_bal_time June EUR 1
5 Dan_city Berlin 1
6 ID_bal_mod OPBD EUR 1
7 Dan_country 55 1
8 ID_bal_type CRDT 1
9 ID_bal_amt 432 2
10 ID_bal_time August EUR 2
11 ID_bal_mod CLBD EUR 2
12 ID_bal_type DBT USD 2
13 Dan_sex M USD 2
14 Dan_Age 22 USD 2
15 Dan_country FRA 2
16 Dan_sex M CHF 3
17 Dan_city Madrid 3
18 Dan_country ESP DKN 3
19 ID_bal_amt 432 4
20 ID_bal_time March 4
21 ID_bal_mod FABD USD 4
22 ID_bal_type CRDT CHF 4
我想缩小这个数据框!我想通过仅保留与模式关联的行子组来减少包含字符串“bal”的行:“CLBD”。这意味着我在“值”列中搜索名称“ID_bal_mod”的字符串“CLBD”,然后保留所有其他名称 ID_bal_amt、ID_bal_time、ID_bal_mod, ID_bal_type 属于同一组。在我们的示例中,它是组 2 中的名称。
此外,我想将“组”列中的它们的值更改为 0,而不更改所有其他不包含字符串“bal”的“名称”的“组”。所以我想在第 2 组中保留 Dan_sex、Dan_sex、Dan_country。
所以最后我想得到这个索引也被重置的新数据框。
Name Value Ccy Group
0 ID TAMARA_CO 0
1 Country GERMANY 0
2 IBAN FR56 0
3 Dan_city Berlin 1
4 Dan_country 55 1
5 ID_bal_amt 432 0
6 ID_bal_time August EUR 0
7 ID_bal_mod CLBD EUR 0
8 ID_bal_type DBT USD 0
9 Dan_sex M USD 2
10 Dan_Age 22 USD 2
11 Dan_country FRA 2
12 Dan_sex M CHF 3
13 Dan_city Madrid 3
14 Dan_country ESP DKN 3
我的暂定:
# keeps only the rows with the string 'bal'
di = df[df['Name'].str.contains('bal')]
# return true or false if they are in the group that contains the mode 'CLBD'
di=[di['Value'].eq('CLBD').groupby(di['Group']).transform('any')]
[3 False
4 False
6 False
8 False
9 True
10 True
11 True
12 True
19 False
20 False
21 False
22 False
Name: Value, dtype: bool]
有人有有效的想法吗?抱歉,如果解释得不好,英语不是我的母语。
谢谢
IIUC,试试这个:
m1 = df['Value'].eq('CLBD').groupby(df['Group']).transform('any')
m2 = ~df['Name'].str.contains('bal')
df_out = df[m1 | m2].copy()
df_out['Group'] = df_out['Group'].mask(df_out['Name'].str.contains('bal'), 0)
df_out
输出:
Name Value Ccy Group
0 ID TAMARA_CO 0
1 Country GERMANY 0
2 IBAN FR56 0
5 Dan_city Berlin 1
7 Dan_country 55 1
9 ID_bal_amt 432 0
10 ID_bal_time August EUR 0
11 ID_bal_mod CLBD EUR 0
12 ID_bal_type DBT USD 0
13 Dan_sex M USD 2
14 Dan_Age 22 USD 2
15 Dan_country FRA 2
16 Dan_sex M CHF 3
17 Dan_city Madrid 3
18 Dan_country ESP DKN 3
假设我有这个数据框:
import pandas as pd
Name = ['ID', 'Country', 'IBAN','ID_bal_amt', 'ID_bal_time','Dan_city','ID_bal_mod','Dan_country','ID_bal_type', 'ID_bal_amt', 'ID_bal_time','ID_bal_mod','ID_bal_type' ,'Dan_sex', 'Dan_Age', 'Dan_country','Dan_sex' , 'Dan_city','Dan_country','ID_bal_amt', 'ID_bal_time','ID_bal_mod','ID_bal_type' ]
Value = ['TAMARA_CO', 'GERMANY','FR56', '12','June','Berlin','OPBD', '55','CRDT','432', 'August', 'CLBD','DBT', 'M', '22', 'FRA', 'M', 'Madrid', 'ESP','432','March','FABD','CRDT']
Ccy = ['','','','EUR','EUR','','EUR','','','','EUR','EUR','USD','USD','USD','','CHF', '','DKN','','','USD','CHF']
Group = ['0','0','0','1','1','1','1','1','1','2','2','2','2','2','2','2','3','3','3','4','4','4','4']
df = pd.DataFrame({'Name':Name, 'Value' : Value, 'Ccy' : Ccy,'Group':Group})
print(df)
Name Value Ccy Group
0 ID TAMARA_CO 0
1 Country GERMANY 0
2 IBAN FR56 0
3 ID_bal_amt 12 EUR 1
4 ID_bal_time June EUR 1
5 Dan_city Berlin 1
6 ID_bal_mod OPBD EUR 1
7 Dan_country 55 1
8 ID_bal_type CRDT 1
9 ID_bal_amt 432 2
10 ID_bal_time August EUR 2
11 ID_bal_mod CLBD EUR 2
12 ID_bal_type DBT USD 2
13 Dan_sex M USD 2
14 Dan_Age 22 USD 2
15 Dan_country FRA 2
16 Dan_sex M CHF 3
17 Dan_city Madrid 3
18 Dan_country ESP DKN 3
19 ID_bal_amt 432 4
20 ID_bal_time March 4
21 ID_bal_mod FABD USD 4
22 ID_bal_type CRDT CHF 4
我想缩小这个数据框!我想通过仅保留与模式关联的行子组来减少包含字符串“bal”的行:“CLBD”。这意味着我在“值”列中搜索名称“ID_bal_mod”的字符串“CLBD”,然后保留所有其他名称 ID_bal_amt、ID_bal_time、ID_bal_mod, ID_bal_type 属于同一组。在我们的示例中,它是组 2 中的名称。
此外,我想将“组”列中的它们的值更改为 0,而不更改所有其他不包含字符串“bal”的“名称”的“组”。所以我想在第 2 组中保留 Dan_sex、Dan_sex、Dan_country。
所以最后我想得到这个索引也被重置的新数据框。
Name Value Ccy Group
0 ID TAMARA_CO 0
1 Country GERMANY 0
2 IBAN FR56 0
3 Dan_city Berlin 1
4 Dan_country 55 1
5 ID_bal_amt 432 0
6 ID_bal_time August EUR 0
7 ID_bal_mod CLBD EUR 0
8 ID_bal_type DBT USD 0
9 Dan_sex M USD 2
10 Dan_Age 22 USD 2
11 Dan_country FRA 2
12 Dan_sex M CHF 3
13 Dan_city Madrid 3
14 Dan_country ESP DKN 3
我的暂定:
# keeps only the rows with the string 'bal'
di = df[df['Name'].str.contains('bal')]
# return true or false if they are in the group that contains the mode 'CLBD'
di=[di['Value'].eq('CLBD').groupby(di['Group']).transform('any')]
[3 False
4 False
6 False
8 False
9 True
10 True
11 True
12 True
19 False
20 False
21 False
22 False
Name: Value, dtype: bool]
有人有有效的想法吗?抱歉,如果解释得不好,英语不是我的母语。
谢谢
IIUC,试试这个:
m1 = df['Value'].eq('CLBD').groupby(df['Group']).transform('any')
m2 = ~df['Name'].str.contains('bal')
df_out = df[m1 | m2].copy()
df_out['Group'] = df_out['Group'].mask(df_out['Name'].str.contains('bal'), 0)
df_out
输出:
Name Value Ccy Group
0 ID TAMARA_CO 0
1 Country GERMANY 0
2 IBAN FR56 0
5 Dan_city Berlin 1
7 Dan_country 55 1
9 ID_bal_amt 432 0
10 ID_bal_time August EUR 0
11 ID_bal_mod CLBD EUR 0
12 ID_bal_type DBT USD 0
13 Dan_sex M USD 2
14 Dan_Age 22 USD 2
15 Dan_country FRA 2
16 Dan_sex M CHF 3
17 Dan_city Madrid 3
18 Dan_country ESP DKN 3