使用 isin 从列表中获取数据框列

Get dataframe columns from a list using isin

我有一个数据框 df1,我有一个包含 df1.

几列名称的列表
df1:
User_id  month  day  Age   year    CVI    ZIP    sex  wgt
0           1    7   16    1977     2      NA    M    NaN
1           2    7   16    1977     3      NA    M    NaN
2           3    7   16    1977     2      DM    F    NaN
3           4    7   16    1977     7      DM    M    NaN
4           5    7   16    1977     3      DM    M    NaN
...        ...    ...  ...   ...   ...     ...  ...  ...
35544      35545   12   31  2002    15      AH  NaN  NaN
35545      35546   12   31  2002    15      AH  NaN  NaN
35546      35547   12   31  2002    10      RM    F   14
35547      35548   12   31  2002     7      DO    M   51
35548      35549   12   31  2002     5     NaN  NaN  NaN

 list= [u"User_id", u"day", u"ZIP", u"sex"]

我想创建一个新的数据框 df2,它只包含列表中的那些列,以及一个数据框 df3,它包含不在列表中的列。

Here 我发现我需要做:

df2=df1[df1[df1.columns[1]].isin(list)]

但结果我得到:

Empty DataFrame
Columns: []
Index: []
[0 rows x 9 columns]

我做错了什么,我怎样才能得到需要的结果?如果它应该是 4,为什么是“9 列”?

你可以试试:

df2 = df1[list] # it does a projection on the columns contained in the list
df3 = df1[[col for col in df1.columns if col not in list]]

Index.difference的解决方案:

L = [u"User_id", u"day", u"ZIP", u"sex"]

df2 = df1[L] 
df3 = df1[df1.columns.difference(df2.columns)]
print (df2)
   User_id  day  ZIP sex
0        0    7  NaN   M
1        1    7  NaN   M
2        2    7   DM   F
3        3    7   DM   M
4        4    7   DM   M

print (df3)
   Age  CVI  month  wgt  year
0   16    2      1  NaN  1977
1   16    3      2  NaN  1977
2   16    2      3  NaN  1977
3   16    7      4  NaN  1977
4   16    3      5  NaN  1977

或者:

df2 = df1[L] 
df3 = df1[df1.columns.difference(pd.Index(L))]
print (df2)
   User_id  day  ZIP sex
0        0    7  NaN   M
1        1    7  NaN   M
2        2    7   DM   F
3        3    7   DM   M
4        4    7   DM   M

print (df3)
   Age  CVI  month  wgt  year
0   16    2      1  NaN  1977
1   16    3      2  NaN  1977
2   16    2      3  NaN  1977
3   16    7      4  NaN  1977
4   16    3      5  NaN  1977

永远不要将列表命名为 "list"

my_list= [u"User_id", u"day", u"ZIP", u"sex"]
df2 = df1[df1.keys()[df1.keys().isin(my_list)]]

永远不要将列表命名为 "list"

my_list= [u"User_id", u"day", u"ZIP", u"sex"]
df2 = df1[df1.keys()[df1.keys().isin(my_list)]]

df2 = df1[df1.columns[df1.columns.isin(my_list)]]