如何从 Python 中的数据中删除特定内容

Question

我有这样的数据：

draft_round
    0   1st round
    1   3rd round
    2   1st round
    3   16th round
    4   2nd round
    ... ...
    4680    1st round
    4681    NaN
    4682    2nd round
    4683    2nd round
    4684    1947 BAA Draf

如您所见，每一行数据都有复杂的数据，是文字和数字的组合。对我来说重要的是获得这些行中的数字。例如，我想在名为“1st round”的数据行中获取数字“1”，在名为“16th round”的数据行中获取数字“16”。换句话说，我希望产量如下：

      draft_round
        0   1
        1   3
        2   1
        3   16
        4   2
        ... ...
        4680    1
        4681    NaN
        4682    2
        4683    20
        4684    1947 BAA Draf

我希望我能够解释我的问题，在此先感谢。

Answer 1

你可以试试.str.replace:

df["draft_round"] = df["draft_round"].str.replace(
    r"(\d+).*round", r"", regex=True
)
print(df)

打印：

        draft_round
0                 1
1                 3
2                 1
3                16
4                 2
4680              1
4681            NaN
4682              2
4683              2
4684  1947 BAA Draf

Answer 2

尝试str.split:

df['draft_round'] = df['draft_round'].str.split(pat='[a-z]', expand=True)[0]

如何从 Python 中的数据中删除特定内容

How to remove spesific things from data in Python

python

dataframe

python-3.x

pandas

data-science