Python:如何遍历数据框中的一系列列,检查特定值并将列名存储在列表中
Python: How to iterate through a range of columns in a dataframe, check for specific values and store column name in a list
我正在尝试遍历数据框中的一系列列并检查每一行中的特定值。这些值应该与我的列表相匹配。如果在我的列表的每一行中都有匹配值,那么第一个匹配项的列名应该附加到我的新列表中。怎样才能做到这一点?我尝试了以下 for 循环,但无法正确执行。
我查看了几个例子,但找不到我要找的东西。
iterating through a column in dataframe and creating a list with name of the column + str
How to get the column name for a specific values in every row of a dataframe
import pandas as pd
random = {
'col1': ['45c','5v','27','k22','wh','u5','36'],
'col2': ['abc','bca','cab','bac','cab','aab','ccb'],
'col3': ['xyz','zxy','yxz','zzy','yyx','xyx','zzz'],
'col4': ['52','75c','k22','d2','3n','4b','cc'],
'col5': ['tuv','vut','tut','vtu','uvt','uut','vvt'],
'col6': ['la3','pl','5v','45c','3s','k22','9i']
}
df = pd.DataFrame(random)
"""
Only 1 value from this list should match with the values in each row of the df
i.e if '45c' is in row 3, then it's a match. place the name of column where '45c' is found in the df in the new list
"""
list = ['45c','5v','d2','3n','k22',]
"""
empty list that should be populated with df column names if there is a single match
"""
rand = []
for row in df.iloc[:,2:5]:
for x in row:
if df[x] in list:
rand.append(df[row][x].columns)
break
print(rand)
#this is what my df looks like when I print it
col1 col2 col3 col4 col5 col6
0 45c abc xyz 52 tuv la3
1 5v bca zxy 75c vut pl
2 27 cab yxz k22 tut 5v
3 k22 bac zzy d2 vtu 45c
4 wh cab yyx 3n uvt 3s
5 u5 aab xyx 4b uut k22
6 36 ccb zzz cc vvt 9i
我希望得到的输出如下:
rand = ['col1','col4','col1','col6']
首先将所有值与DataFrame.isin
and get column of first matched value with DataFrame.idxmax
, but because if no match it return first column is added condition with DataFrame.any
进行比较以进行测试:
L = ['45c','5v','d2','3n','k22']
m = df.isin(L)
out = np.where(m.any(1), m.idxmax(axis=1), 'no match').tolist()
print (out)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6', 'no match']
如果只需要匹配值:
out1 = m.idxmax(axis=1)[m.any(1)].tolist()
print (out1)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6']
详情:
print (m)
col1 col2 col3 col4 col5 col6
0 True False False False False False
1 True False False False False False
2 False False False True False True
3 True False False True False True
4 False False False True False False
5 False False False False False True
6 False False False False False False
循环解决是可能的,但是not recommended:
rand = []
for i, row in df.iterrows():
for x in row:
if x in L:
rand.append(i)
print(rand)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6']
我正在尝试遍历数据框中的一系列列并检查每一行中的特定值。这些值应该与我的列表相匹配。如果在我的列表的每一行中都有匹配值,那么第一个匹配项的列名应该附加到我的新列表中。怎样才能做到这一点?我尝试了以下 for 循环,但无法正确执行。
我查看了几个例子,但找不到我要找的东西。
iterating through a column in dataframe and creating a list with name of the column + str
How to get the column name for a specific values in every row of a dataframe
import pandas as pd
random = {
'col1': ['45c','5v','27','k22','wh','u5','36'],
'col2': ['abc','bca','cab','bac','cab','aab','ccb'],
'col3': ['xyz','zxy','yxz','zzy','yyx','xyx','zzz'],
'col4': ['52','75c','k22','d2','3n','4b','cc'],
'col5': ['tuv','vut','tut','vtu','uvt','uut','vvt'],
'col6': ['la3','pl','5v','45c','3s','k22','9i']
}
df = pd.DataFrame(random)
"""
Only 1 value from this list should match with the values in each row of the df
i.e if '45c' is in row 3, then it's a match. place the name of column where '45c' is found in the df in the new list
"""
list = ['45c','5v','d2','3n','k22',]
"""
empty list that should be populated with df column names if there is a single match
"""
rand = []
for row in df.iloc[:,2:5]:
for x in row:
if df[x] in list:
rand.append(df[row][x].columns)
break
print(rand)
#this is what my df looks like when I print it
col1 col2 col3 col4 col5 col6
0 45c abc xyz 52 tuv la3
1 5v bca zxy 75c vut pl
2 27 cab yxz k22 tut 5v
3 k22 bac zzy d2 vtu 45c
4 wh cab yyx 3n uvt 3s
5 u5 aab xyx 4b uut k22
6 36 ccb zzz cc vvt 9i
我希望得到的输出如下:
rand = ['col1','col4','col1','col6']
首先将所有值与DataFrame.isin
and get column of first matched value with DataFrame.idxmax
, but because if no match it return first column is added condition with DataFrame.any
进行比较以进行测试:
L = ['45c','5v','d2','3n','k22']
m = df.isin(L)
out = np.where(m.any(1), m.idxmax(axis=1), 'no match').tolist()
print (out)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6', 'no match']
如果只需要匹配值:
out1 = m.idxmax(axis=1)[m.any(1)].tolist()
print (out1)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6']
详情:
print (m)
col1 col2 col3 col4 col5 col6
0 True False False False False False
1 True False False False False False
2 False False False True False True
3 True False False True False True
4 False False False True False False
5 False False False False False True
6 False False False False False False
循环解决是可能的,但是not recommended:
rand = []
for i, row in df.iterrows():
for x in row:
if x in L:
rand.append(i)
print(rand)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6']