如何根据多个单元格选择多行数据框的值?
How to pick multiple rows' value of dataframe according to multiple cells?
我有一个数据框(user_id,session_id,items1),每个用户都有多个会话,我想为每个用户单独选择每个会话来比较它的项目,我用列表的列表,但它 return 0。如何获得?
Dataframe
items1_list = list(items1_list) # list of all items in each session_id for user_id
for i in data.user_id: # user_id loop
for j in data.session_id: # session_id loop
for l in range(3): # number of sessions for each user, NO 3 for testing
items1_list[l] = data.loc[i].loc[j].items1
print (items1_list)
数据帧示例:
user_id session_id items1
1 19 [214561790, 214561790, 214611457, 214611457]
43 [214691587, 214587915]
52 [214716982, 214716984]
2 42 [214819745, 214819745]
58 [214515834, 214515830]
目标输出(如果当前用户= user1):
[[214561790, 214561790, 214611457, 214611457], [214691587, 214587915], [214716982, 214716984]]
目标输出(如果当前用户= user2):
[[214819745, 214819745],[214515834, 214515830]]
这是前 11 行(用户 1、用户 2、用户 3 的会话):
{'items1': {(1, 19): [214561790, 214561790, 214611457, 214611457],
(1, 27): [214827028,214827017,214537796,214840762,214707930,214707930,
214585652,214536197,214536195,214646169],
(1, 43): [214691587, 214587915],
(1, 52): [214716982, 214716984],
(1, 54): [214819468, 214716977, 214716977, 214716977, 214716977, 214716939],
(2, 42): [214819745, 214819745],
(2, 58): [214515834, 214515830],
(2, 62): [214714794, 214601407],
(2, 87): [214652220,214840483,214840483,214717286,214558807,214821300,214826908,
214826908,214826908,214554637,214819430,214819430,214826837,214826837,
214820392,214820392,214586694,214819376,214553844,214601229,214555500,
214695127,214819760,214717850,214718385,214743369,214743369],
(3, 28): [214836789, 214836789, 214710804],
(3, 140837): [214586711,214821305,214821305,214821305,214612721,214586711,
214586711,214586711,214837442,214821339,214821339,214553735,214553735]},
'items2': {(1, 19): 0,
(1, 27): 0,
(1, 43): 0,
(1, 52): 0,
(1, 54): 0,
(2, 42): 0,
(2, 58): 0,
(2, 62): 0,
(2, 87):
[214652220,214840483,214743369,214826837,214820392,214826908,214819430],
(3, 28): 0,
(3, 140837): [214821339, 214586711, 214821339, 214586711]}}
为了提高效率,让我们按用户排序列表
# order to get a list
df.sort_values(by=['user_id'])
然后我们使用综合列表来获取与会话和用户关联的所有项目。
itPerSession = [] #output list
# loop to extract the info
for i in range(df.shape[0]): #df.shape[0] are number of rows
for user in df['user_id']:
vUser = df['user_id'][i]
vSession = [session for session in df['session_id'] if user]
vItems = [items for items in df['items1'] if vSession]
varTextS = 'Session:'
varTextU = 'by user:'
chain = [varTextS,vSession[i],vItems[i],varTextU,vUser]
itPerSession.append(chain) #outside the user's loop to avoid repetition
print(itPerSession)
[['Session:', 19, [214561790, 214561790, 214611457, 214611457], 'by user:', 1],
['Session:', 43, [214691587, 214587915], 'by user:', 1],
['Session:', 52, [214716982, 214716984], 'by user:', 1],
['Session:', 43, [214819745, 214819745], 'by user:', 2],
['Session:', 58, [214515834, 214515830], 'by user:', 2]]
希望对您有所帮助。
要按用户打印会话数,请使用 groupby,第一个参数是您要计算的数,在本例中 user_id:
df.groupby(['user_id'])['session_id'].count()
结果是:
user_id
1 2
2 2
获取特定用户的信息,使用相同的代码:
itPerSession = [] #output list
userId = 1 #user definition
# loop to extract the info
for i in range(df.shape[0]): #df.shape[0] are number of rows
for user in df['user_id']:
vUser = (df['user_id'][i] == userId) # fix the user
vSession = [session for session in df['session_id'] if user]
vItems = [items for items in df['items1'] if vSession]
varTextS = 'Session:'
varTextU = 'by user:'
chain = [varTextS,vSession[i],vItems[i],varTextU,userId]
if vUser: #it's a true condition, and not the user
itPerSession.append(chain)
print(itPerSession)
[['Session:', 19, [214561790, 214561790, 214611457, 214611457], 'by user:', 1], ['Session:', 52, [214716982, 214716984], 'by user:', 1]]
为了理解代码的作用以及如何使用它,我建议您打印不同级别的变量。
如果代码对您有用,请点击复选标记。这是奖励我在这里帮助你的努力的好方法。
我有一个数据框(user_id,session_id,items1),每个用户都有多个会话,我想为每个用户单独选择每个会话来比较它的项目,我用列表的列表,但它 return 0。如何获得?
Dataframe
items1_list = list(items1_list) # list of all items in each session_id for user_id
for i in data.user_id: # user_id loop
for j in data.session_id: # session_id loop
for l in range(3): # number of sessions for each user, NO 3 for testing
items1_list[l] = data.loc[i].loc[j].items1
print (items1_list)
数据帧示例:
user_id session_id items1
1 19 [214561790, 214561790, 214611457, 214611457]
43 [214691587, 214587915]
52 [214716982, 214716984]
2 42 [214819745, 214819745]
58 [214515834, 214515830]
目标输出(如果当前用户= user1):
[[214561790, 214561790, 214611457, 214611457], [214691587, 214587915], [214716982, 214716984]]
目标输出(如果当前用户= user2):
[[214819745, 214819745],[214515834, 214515830]]
这是前 11 行(用户 1、用户 2、用户 3 的会话):
{'items1': {(1, 19): [214561790, 214561790, 214611457, 214611457],
(1, 27): [214827028,214827017,214537796,214840762,214707930,214707930,
214585652,214536197,214536195,214646169],
(1, 43): [214691587, 214587915],
(1, 52): [214716982, 214716984],
(1, 54): [214819468, 214716977, 214716977, 214716977, 214716977, 214716939],
(2, 42): [214819745, 214819745],
(2, 58): [214515834, 214515830],
(2, 62): [214714794, 214601407],
(2, 87): [214652220,214840483,214840483,214717286,214558807,214821300,214826908,
214826908,214826908,214554637,214819430,214819430,214826837,214826837,
214820392,214820392,214586694,214819376,214553844,214601229,214555500,
214695127,214819760,214717850,214718385,214743369,214743369],
(3, 28): [214836789, 214836789, 214710804],
(3, 140837): [214586711,214821305,214821305,214821305,214612721,214586711,
214586711,214586711,214837442,214821339,214821339,214553735,214553735]},
'items2': {(1, 19): 0,
(1, 27): 0,
(1, 43): 0,
(1, 52): 0,
(1, 54): 0,
(2, 42): 0,
(2, 58): 0,
(2, 62): 0,
(2, 87):
[214652220,214840483,214743369,214826837,214820392,214826908,214819430],
(3, 28): 0,
(3, 140837): [214821339, 214586711, 214821339, 214586711]}}
为了提高效率,让我们按用户排序列表
# order to get a list
df.sort_values(by=['user_id'])
然后我们使用综合列表来获取与会话和用户关联的所有项目。
itPerSession = [] #output list
# loop to extract the info
for i in range(df.shape[0]): #df.shape[0] are number of rows
for user in df['user_id']:
vUser = df['user_id'][i]
vSession = [session for session in df['session_id'] if user]
vItems = [items for items in df['items1'] if vSession]
varTextS = 'Session:'
varTextU = 'by user:'
chain = [varTextS,vSession[i],vItems[i],varTextU,vUser]
itPerSession.append(chain) #outside the user's loop to avoid repetition
print(itPerSession)
[['Session:', 19, [214561790, 214561790, 214611457, 214611457], 'by user:', 1],
['Session:', 43, [214691587, 214587915], 'by user:', 1],
['Session:', 52, [214716982, 214716984], 'by user:', 1],
['Session:', 43, [214819745, 214819745], 'by user:', 2],
['Session:', 58, [214515834, 214515830], 'by user:', 2]]
希望对您有所帮助。
要按用户打印会话数,请使用 groupby,第一个参数是您要计算的数,在本例中 user_id:
df.groupby(['user_id'])['session_id'].count()
结果是:
user_id
1 2
2 2
获取特定用户的信息,使用相同的代码:
itPerSession = [] #output list
userId = 1 #user definition
# loop to extract the info
for i in range(df.shape[0]): #df.shape[0] are number of rows
for user in df['user_id']:
vUser = (df['user_id'][i] == userId) # fix the user
vSession = [session for session in df['session_id'] if user]
vItems = [items for items in df['items1'] if vSession]
varTextS = 'Session:'
varTextU = 'by user:'
chain = [varTextS,vSession[i],vItems[i],varTextU,userId]
if vUser: #it's a true condition, and not the user
itPerSession.append(chain)
print(itPerSession)
[['Session:', 19, [214561790, 214561790, 214611457, 214611457], 'by user:', 1], ['Session:', 52, [214716982, 214716984], 'by user:', 1]]
为了理解代码的作用以及如何使用它,我建议您打印不同级别的变量。
如果代码对您有用,请点击复选标记。这是奖励我在这里帮助你的努力的好方法。