如何根据条件将数据框值附加到空列表
How to append dataframe values to empty lists based on conditions
我有以下数据框:
dictionary = {'Monday': {'John': 5,
'Lisa': 1,
'Karyn': 'NaN',
'steve': 1,
'ryan': 4,
'chris': 5,
'jessie': 6},
'Friday': {'John': 0,
'Lisa': 1,
'Karyn':'NaN',
'steve': 4,
'ryan': 7,
'chris': 'NaN',
'jessie': 11},
'Saturday': {'John': 0,
'Lisa': 1,
'Karyn': 2,
'steve': 4,
'ryan': 'NaN',
'chris': 'NaN',
'jessie': 1}}
tab = pd.DataFrame(dictionary)
Monday Friday Saturday
John 5 0 0
Lisa 1 1 1
Karyn NaN NaN 2
steve 1 4 4
ryan 4 7 NaN
chris 5 NaN NaN
jessie 6 11 1
我有这些空列表
mon_only = []
fri_only = []
sat_only = []
mon_fri_only = []
mon_sat_only = []
fri_sat_only = []
mon_fri_sat = []
我想根据下降的位置将索引附加到这些列表中。例如,如果索引名称的值大于零,则认为它存在于该列中。如果它只出现在一个星期一的列中,那么它会转到 mon_only 列表。如果它出现在所有三列中,那么它将进入 mon_fri_sat 列表。
结果基本上应该是这样的
mon_only = ['John','chris']
fri_only = []
sat_only = ['Karyn']
mon_fri_only = ['ryan']
mon_sat_only = []
fri_sat_only = []
mon_fri_sat = ['Lisa','steve','jessie']
您可以使用 itertools.combinations
首先创建组合,然后使用条件和 df.dot
获取值不是 'NaN'
或 0
的列名称。最后重新索引并用 []
填充 nan
from itertools import combinations
from collections import defaultdict
delim = ","
c = ~(tab.eq("NaN")|tab.eq(0))
d = c.dot(c.columns+delim).str.rstrip(delim)
ind = [delim.join(idx) for i in range(1,len(tab.columns)+1)
for idx in list(combinations(tab.columns,i))]
defd = defaultdict(list)
for k,v in d.items():
if v not in defd[v]:
defd[v].append(k)
out_d = pd.Series(defd).reindex(ind,fill_value=[]).to_dict()
输出:
print(out_d)
{'Monday': ['John', 'chris'],
'Friday': [],
'Saturday': ['Karyn'],
'Monday,Friday': ['ryan'],
'Monday,Saturday': [],
'Friday,Saturday': [],
'Monday,Friday,Saturday': ['Lisa', 'steve', 'jessie']}
将此字典保存在一个变量中,然后按键切片以获得所需的输出。
如果组合无关紧要,则代码相同但更小:
from collections import defaultdict
defd = defaultdict(list)
c = ~(tab.eq("NaN")|tab.eq(0))
d = c.dot(c.columns+',').str.rstrip(",")
for k,v in d.items():
if v not in defd[v]:
defd[v].append(k)
print(defd)
defaultdict(list,
{'Monday': ['John', 'chris'],
'Monday,Friday,Saturday': ['Lisa', 'steve', 'jessie'],
'Saturday': ['Karyn'],
'Monday,Friday': ['ryan']})
您可以尝试这样的操作:
dictionary = {'Monday': {'John': 5,
'Lisa': 1,
'Karyn': 'NaN',
'steve': 1,
'ryan': 4,
'chris': 5,
'jessie': 6},
'Friday': {'John': 0,
'Lisa': 1,
'Karyn':'NaN',
'steve': 4,
'ryan': 7,
'chris': 'NaN',
'jessie': 11},
'Saturday': {'John': 0,
'Lisa': 1,
'Karyn': 2,
'steve': 4,
'ryan': 'NaN',
'chris': 'NaN',
'jessie': 1}}
tab = pd.DataFrame(dictionary)
tab.replace('NaN', np.nan, inplace=True)
tab['Name']=tab.index
days=['Monday', 'Friday', 'Saturday']
for d in days:
tab[d+'_bool']=~tab[d].isin([0, np.nan])
tab.groupby([d+'_bool' for d in days])['Name'].apply(list)
输出:
Monday_bool Friday_bool Saturday_bool
False False True [Karyn]
True False False [John, chris]
True False [ryan]
True [Lisa, steve, jessie]
Name: Name, dtype: object
我有以下数据框:
dictionary = {'Monday': {'John': 5,
'Lisa': 1,
'Karyn': 'NaN',
'steve': 1,
'ryan': 4,
'chris': 5,
'jessie': 6},
'Friday': {'John': 0,
'Lisa': 1,
'Karyn':'NaN',
'steve': 4,
'ryan': 7,
'chris': 'NaN',
'jessie': 11},
'Saturday': {'John': 0,
'Lisa': 1,
'Karyn': 2,
'steve': 4,
'ryan': 'NaN',
'chris': 'NaN',
'jessie': 1}}
tab = pd.DataFrame(dictionary)
Monday Friday Saturday
John 5 0 0
Lisa 1 1 1
Karyn NaN NaN 2
steve 1 4 4
ryan 4 7 NaN
chris 5 NaN NaN
jessie 6 11 1
我有这些空列表
mon_only = []
fri_only = []
sat_only = []
mon_fri_only = []
mon_sat_only = []
fri_sat_only = []
mon_fri_sat = []
我想根据下降的位置将索引附加到这些列表中。例如,如果索引名称的值大于零,则认为它存在于该列中。如果它只出现在一个星期一的列中,那么它会转到 mon_only 列表。如果它出现在所有三列中,那么它将进入 mon_fri_sat 列表。
结果基本上应该是这样的
mon_only = ['John','chris']
fri_only = []
sat_only = ['Karyn']
mon_fri_only = ['ryan']
mon_sat_only = []
fri_sat_only = []
mon_fri_sat = ['Lisa','steve','jessie']
您可以使用 itertools.combinations
首先创建组合,然后使用条件和 df.dot
获取值不是 'NaN'
或 0
的列名称。最后重新索引并用 []
from itertools import combinations
from collections import defaultdict
delim = ","
c = ~(tab.eq("NaN")|tab.eq(0))
d = c.dot(c.columns+delim).str.rstrip(delim)
ind = [delim.join(idx) for i in range(1,len(tab.columns)+1)
for idx in list(combinations(tab.columns,i))]
defd = defaultdict(list)
for k,v in d.items():
if v not in defd[v]:
defd[v].append(k)
out_d = pd.Series(defd).reindex(ind,fill_value=[]).to_dict()
输出:
print(out_d)
{'Monday': ['John', 'chris'],
'Friday': [],
'Saturday': ['Karyn'],
'Monday,Friday': ['ryan'],
'Monday,Saturday': [],
'Friday,Saturday': [],
'Monday,Friday,Saturday': ['Lisa', 'steve', 'jessie']}
将此字典保存在一个变量中,然后按键切片以获得所需的输出。
如果组合无关紧要,则代码相同但更小:
from collections import defaultdict
defd = defaultdict(list)
c = ~(tab.eq("NaN")|tab.eq(0))
d = c.dot(c.columns+',').str.rstrip(",")
for k,v in d.items():
if v not in defd[v]:
defd[v].append(k)
print(defd)
defaultdict(list,
{'Monday': ['John', 'chris'],
'Monday,Friday,Saturday': ['Lisa', 'steve', 'jessie'],
'Saturday': ['Karyn'],
'Monday,Friday': ['ryan']})
您可以尝试这样的操作:
dictionary = {'Monday': {'John': 5,
'Lisa': 1,
'Karyn': 'NaN',
'steve': 1,
'ryan': 4,
'chris': 5,
'jessie': 6},
'Friday': {'John': 0,
'Lisa': 1,
'Karyn':'NaN',
'steve': 4,
'ryan': 7,
'chris': 'NaN',
'jessie': 11},
'Saturday': {'John': 0,
'Lisa': 1,
'Karyn': 2,
'steve': 4,
'ryan': 'NaN',
'chris': 'NaN',
'jessie': 1}}
tab = pd.DataFrame(dictionary)
tab.replace('NaN', np.nan, inplace=True)
tab['Name']=tab.index
days=['Monday', 'Friday', 'Saturday']
for d in days:
tab[d+'_bool']=~tab[d].isin([0, np.nan])
tab.groupby([d+'_bool' for d in days])['Name'].apply(list)
输出:
Monday_bool Friday_bool Saturday_bool
False False True [Karyn]
True False False [John, chris]
True False [ryan]
True [Lisa, steve, jessie]
Name: Name, dtype: object