Pandas 根据列项计数查找和替换
Pandas find and replace based on column items count
我有一个看起来像这样的数据框
import pandas as pd
all_data_set = [
('A','Area1','AA','A B D E','A B','D E'),
('B','Area1','AA','A B D E','A B','D E'),
('C','Area2','BB','C','C','C'),
('E','Area1','CC','A B D E','A B','D E'),
('F','Area3','BB','F G','G','F')
]
all_df = pd.DataFrame(data = all_data_set, columns = ['Name','Area','Type','Group','AA members','CC members'])
Name Area Type Group AA members CC members
0 A Area1 AA A B D E A B D E
1 B Area1 AA A B D E A B D E
2 C Area2 BB C C C
3 E Area1 CC A B D E A B D E
4 F Area3 BB F G G F
最后一行(第 4 行)是正确的。
任何 BB 类型的东西都应该只在 Group
AA members
CC members
中有它自己 (F)
所以它应该是这样的:
4 F Area3 BB F F F
我试图做到这一点:
当类型为 BB
且 Group
的长度为 = 2 项时检查:
df = (all_data_set.loc[(all_data_set['Type']== 'BB')]['Group'].str.split().str.len() == 2)
然后遍历每一行并找到这样的情况
用所有拖放行创建一个新的 Df 并使 Group , AA members, CC members = Name
删除 all_df
中发生的行
合并 3.
回到 all_df
有更好的pandas方法吗?
尝试
# identify rows where Type is BB
m = all_df['Type'] == 'BB'
# for Type BB rows, replace Group, AA members and CC members values by Name
all_df.loc[m, ['Group', 'AA members', 'CC members']] = all_df.loc[m, 'Name']
print(all_df)
Name Area Type Group AA members CC members
0 A Area1 AA A B D E A B D E
1 B Area1 AA A B D E A B D E
2 C Area2 BB C C C
3 E Area1 CC A B D E A B D E
4 F Area3 BB F F F
您可以尝试 iloc
和 for 循环。
for row in all_df.index:
if all_df.iloc[row,2] == "BB":
all_df.iloc[row,3:] = all_df["Name"][row]
all_df
Name Area Type Group AA members CC members
0 A Area1 AA A B D E A B D E
1 B Area1 AA A B D E A B D E
2 C Area2 BB C C C
3 E Area1 CC A B D E A B D E
4 F Area3 BB F F F
我有一个看起来像这样的数据框
import pandas as pd
all_data_set = [
('A','Area1','AA','A B D E','A B','D E'),
('B','Area1','AA','A B D E','A B','D E'),
('C','Area2','BB','C','C','C'),
('E','Area1','CC','A B D E','A B','D E'),
('F','Area3','BB','F G','G','F')
]
all_df = pd.DataFrame(data = all_data_set, columns = ['Name','Area','Type','Group','AA members','CC members'])
Name Area Type Group AA members CC members
0 A Area1 AA A B D E A B D E
1 B Area1 AA A B D E A B D E
2 C Area2 BB C C C
3 E Area1 CC A B D E A B D E
4 F Area3 BB F G G F
最后一行(第 4 行)是正确的。
任何 BB 类型的东西都应该只在 Group
AA members
CC members
所以它应该是这样的:
4 F Area3 BB F F F
我试图做到这一点:
当类型为
BB
且Group
的长度为 = 2 项时检查:df = (all_data_set.loc[(all_data_set['Type']== 'BB')]['Group'].str.split().str.len() == 2)
然后遍历每一行并找到这样的情况
用所有拖放行创建一个新的 Df 并使 Group , AA members, CC members = Name
删除
中发生的行all_df
合并
3.
回到all_df
有更好的pandas方法吗?
尝试
# identify rows where Type is BB
m = all_df['Type'] == 'BB'
# for Type BB rows, replace Group, AA members and CC members values by Name
all_df.loc[m, ['Group', 'AA members', 'CC members']] = all_df.loc[m, 'Name']
print(all_df)
Name Area Type Group AA members CC members
0 A Area1 AA A B D E A B D E
1 B Area1 AA A B D E A B D E
2 C Area2 BB C C C
3 E Area1 CC A B D E A B D E
4 F Area3 BB F F F
您可以尝试 iloc
和 for 循环。
for row in all_df.index:
if all_df.iloc[row,2] == "BB":
all_df.iloc[row,3:] = all_df["Name"][row]
all_df
Name Area Type Group AA members CC members
0 A Area1 AA A B D E A B D E
1 B Area1 AA A B D E A B D E
2 C Area2 BB C C C
3 E Area1 CC A B D E A B D E
4 F Area3 BB F F F