pandas str 提取为整数
pandas str extract as integer
考虑 pd.Series
s
s = pd.Series(['A1', 'B2', '3C'])
我想提取每个元素的数字部分。
我知道我可以通过以下方式使用 extract
s.str.extract('(\d)', expand=False)
0 1
1 2
2 3
dtype: object
注意 dtype: object
如果我得到每个元素的type
s.str.extract('(\d)', expand=False).apply(type)
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
dtype: object
问题
如何直接提取为整数?
0 1
1 2
2 3
dtype: int64
我觉得不可能。
查看文档 str.extract
:
Returns:
DataFrame with one row for each subject string, and one column for
each group. Any capture group names in regular expression pat will
be used for column names; otherwise capture group numbers will be
used. The dtype of each result column is always object, even when
no match is found. If expand=True and pat has only one capture group, then return a Series (if subject is a Series) or Index (if subject
is an Index).
所以需要 astype(int)
或者如果 NaN
在输出中 - to_numeric
pd.to_numeric(s.str.extract('(\d)', expand=False))
考虑 pd.Series
s
s = pd.Series(['A1', 'B2', '3C'])
我想提取每个元素的数字部分。
我知道我可以通过以下方式使用 extract
s.str.extract('(\d)', expand=False)
0 1
1 2
2 3
dtype: object
注意 dtype: object
如果我得到每个元素的type
s.str.extract('(\d)', expand=False).apply(type)
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
dtype: object
问题
如何直接提取为整数?
0 1
1 2
2 3
dtype: int64
我觉得不可能。
查看文档 str.extract
:
Returns:
DataFrame with one row for each subject string, and one column for each group. Any capture group names in regular expression pat will be used for column names; otherwise capture group numbers will be used. The dtype of each result column is always object, even when no match is found. If expand=True and pat has only one capture group, then return a Series (if subject is a Series) or Index (if subject is an Index).
所以需要 astype(int)
或者如果 NaN
在输出中 - to_numeric
pd.to_numeric(s.str.extract('(\d)', expand=False))