Python Pandas 为具有重复值的单元格添加条件
Python Pandas Adding conditions on cells with duplicate values
这是我之前问题的后续问题
我有一个数据框:
import pandas as pd
df = pd.DataFrame({'First': ['Sam', 'Greg', 'Steve', 'Sam',
'Jill', 'Bill', 'Nod', 'Mallory', 'Ping', 'Lamar'],
'Last': ['Stevens', 'Hamcunning', 'Strange', 'Stevens',
'Vargas', 'Simon', 'Purple', 'Green', 'Simon', 'Simon'],
'Address': ['112 Fake St',
'13 Crest St',
'14 Main St',
'112 Fake St',
'2 Morningwood',
'7 Cotton Dr',
'14 Main St',
'20 Main St',
'7 Cotton Dr',
'7 Cotton Dr'],
'Status': ['Infected', '', 'Infected', '', '', '', '','', '', 'Infected'],
'Level': [10, 2, 7, 5, 2, 10, 10, 20, 1, 1],
})
假设这次我想将状态值 'infected' 传播给同一地址内的每个人,并附加条件,例如他们是否在 Last 中具有相同的值。
所以结果看起来像:
df2 = df1.copy(deep=True)
df2['Status'] = ['Infected', '', 'Infected', 'Infected', '', 'Infected', '', '', 'Infected', 'Infected']
如果我希望个人在同一地址但不在同一级别时被标记为感染怎么办?结果将是:
df3 = df1.copy(deep=True)
df3['Status'] = ['Infected', '', 'Infected', '', '', 'Infected', '', '', '', 'Infected']
我该怎么做?这是 groupby 问题吗?
"Same address"表示为"groupby".
import pandas as pd
df=pd.DataFrame({'First': [ 'Sam', 'Greg', 'Steve', 'Sam',
'Jill', 'Bill', 'Nod', 'Mallory', 'Ping', 'Lamar'],
'Last': [ 'Stevens', 'Hamcunning', 'Strange', 'Stevens',
'Vargas', 'Simon', 'Purple', 'Green', 'Simon', 'Simon'],
'Address': ['112 Fake St','13 Crest St','14 Main St','112 Fake St','2 Morningwood','7 Cotton Dr','14 Main St','20 Main St','7 Cotton Dr','7 Cotton Dr'],
'Status': ['Infected','','Infected','','','','','','','Infected'],
'Level': [10,2,7,5,2,10,10,20,1,1],
})
df2_index = df.groupby(['Address', 'Last']).filter(lambda x: (x['Status'] == 'Infected').any()).index
df2 = df.copy()
df2.loc[df2_index, 'Status'] = 'Infected'
df3_status = df.groupby('Address', as_index=False, group_keys=False).apply(lambda x: pd.Series(list('Infected' if (row['Status'] == 'Infected') or ((x['Status'] == 'Infected') & (x['Level'] != row['Level'])).any() else '' for _, row in x.iterrows()), index=x.index))
df3 = df.copy()
df3['Status'] = df3_status
这是我之前问题的后续问题
我有一个数据框:
import pandas as pd
df = pd.DataFrame({'First': ['Sam', 'Greg', 'Steve', 'Sam',
'Jill', 'Bill', 'Nod', 'Mallory', 'Ping', 'Lamar'],
'Last': ['Stevens', 'Hamcunning', 'Strange', 'Stevens',
'Vargas', 'Simon', 'Purple', 'Green', 'Simon', 'Simon'],
'Address': ['112 Fake St',
'13 Crest St',
'14 Main St',
'112 Fake St',
'2 Morningwood',
'7 Cotton Dr',
'14 Main St',
'20 Main St',
'7 Cotton Dr',
'7 Cotton Dr'],
'Status': ['Infected', '', 'Infected', '', '', '', '','', '', 'Infected'],
'Level': [10, 2, 7, 5, 2, 10, 10, 20, 1, 1],
})
假设这次我想将状态值 'infected' 传播给同一地址内的每个人,并附加条件,例如他们是否在 Last 中具有相同的值。 所以结果看起来像:
df2 = df1.copy(deep=True)
df2['Status'] = ['Infected', '', 'Infected', 'Infected', '', 'Infected', '', '', 'Infected', 'Infected']
如果我希望个人在同一地址但不在同一级别时被标记为感染怎么办?结果将是:
df3 = df1.copy(deep=True)
df3['Status'] = ['Infected', '', 'Infected', '', '', 'Infected', '', '', '', 'Infected']
我该怎么做?这是 groupby 问题吗?
"Same address"表示为"groupby".
import pandas as pd
df=pd.DataFrame({'First': [ 'Sam', 'Greg', 'Steve', 'Sam',
'Jill', 'Bill', 'Nod', 'Mallory', 'Ping', 'Lamar'],
'Last': [ 'Stevens', 'Hamcunning', 'Strange', 'Stevens',
'Vargas', 'Simon', 'Purple', 'Green', 'Simon', 'Simon'],
'Address': ['112 Fake St','13 Crest St','14 Main St','112 Fake St','2 Morningwood','7 Cotton Dr','14 Main St','20 Main St','7 Cotton Dr','7 Cotton Dr'],
'Status': ['Infected','','Infected','','','','','','','Infected'],
'Level': [10,2,7,5,2,10,10,20,1,1],
})
df2_index = df.groupby(['Address', 'Last']).filter(lambda x: (x['Status'] == 'Infected').any()).index
df2 = df.copy()
df2.loc[df2_index, 'Status'] = 'Infected'
df3_status = df.groupby('Address', as_index=False, group_keys=False).apply(lambda x: pd.Series(list('Infected' if (row['Status'] == 'Infected') or ((x['Status'] == 'Infected') & (x['Level'] != row['Level'])).any() else '' for _, row in x.iterrows()), index=x.index))
df3 = df.copy()
df3['Status'] = df3_status