从列表中的子字符串创建新的 pandas 列

Question

我在名为 'Features' 的 csv 中有数据，其格式如下：

0      [Shops: Close by, Passing trade: Yes]
1      [Lift: Yes, No of Bedrooms: 1, Bedroom 1 Dims:...
2      [Lift: Yes, No of Bedrooms: 2, Bedroom 1 Dims:...
3      [No of Bedrooms: 4, Bedroom 1 Dims: 4.80 x 5.0...
4      [Finish: Excellent, Airconditioning: Yes, Shop...
...

并想为卧室数量创建新的 pandas 列。

0      [N/A]
1      [1]
2      [2]
3      [4]
4      [N/A]
...

我在 python 中尝试过这样的事情：

csvname['No of Bedrooms'] = [s for s in csvname['Features'] if 'No of Bedrooms' in s]

这没有用。有没有简单的方法可以做到这一点？任何帮助将不胜感激。

Answer 1

你可以试试.str.extract

csvname['No of Bedrooms'] = csvname['Features'].astype(str).str.extract('No of Bedrooms: (\d+)')

print(csvname)

                                            Features No of Bedrooms
0              [Shops: Close by, Passing trade: Yes]            NaN
1  [Lift: Yes, No of Bedrooms: 1, Bedroom 1 Dims:...              1
2  [Lift: Yes, No of Bedrooms: 2, Bedroom 1 Dims:...              2
3  [No of Bedrooms: 4, Bedroom 1 Dims: 4.80 x 5.0...              4
4  [Finish: Excellent, Airconditioning: Yes, Shop...            NaN

从列表中的子字符串创建新的 pandas 列

Creating new pandas columns from substrings in a list

python

dataframe

python-3.x

pandas

data-wrangling