Pandas：如何删除系列中的非字母数字列

Question

A Pandas' 系列可以包含无效值：

a     b     c     d      e      f     g 
1    ""   "a3"  np.nan  "\n"   "6"   " "

df = pd.DataFrame([{"a":1, "b":"", "c":"a3", "d":np.nan, "e":"\n", "f":"6", "g":" "}])
row = df.iloc[0]

我想生成一个干净的系列，仅保留包含数值或 非空非 space 的列字母数字字符串:

b 应该被删除，因为它是一个空字符串；
d 因为 np.nan;
e 和 g 因为 space 只有字符串。

预期结果：

a      c     f
1    "a3"   "6"

如何过滤包含数字或有效字母数字的列？

row.str.isalnum() returns NaN for a，而不是我期望的 True。
row.astype(str).str.isalnum() 将 d 的 np.nan 更改为字符串 "nan"，然后将其视为有效字符串。
row.dropna() 当然只掉落 d (np.nan).

我没有看到 https://pandas.pydata.org/pandas-docs/stable/reference/series.html

中列出的其他可能性

作为一种解决方法，我可以在 items() 上循环检查类型和内容，并根据我想保留的值创建一个新系列，但这种方法效率低下（而且很丑）：

for index, value in row.items():
    print (index, value, type(value))


# a 1 <class 'numpy.int64'>
# b  <class 'str'>
# c a3 <class 'str'>
# d nan <class 'numpy.float64'>
# e 
#  <class 'str'>
# f 6 <class 'str'>
# g   <class 'str'>

是否有任何布尔过滤器可以帮助我挑选出好的列？

Answer 1

将值转换为字符串并通过 Series.notna 和按位 AND - &:

链接另一个掩码

row = row[row.astype(str).str.isalnum() & row.notna()]
print (row)
a     1
c    a3
f     6
Name: 0, dtype: object

Answer 2

你可以使用正则表达式

row[row.notna() & row.astype(str).str.match('[a-zA-Z0-9]+')]

Pandas：如何删除系列中的非字母数字列

Pandas: How to remove non-alphanumeric columns in Series

python

series

dataframe

pandas