正则表达式:: 'pandas._libs.interval.Interval' 对象没有属性 'replace'

Question

我有一个包含一列的数据框

id       bins                  
1      (2, 3]        
2      (4, 5]       
3      (6, 7]        
4      (8, 9]       
5      (10, 11]

我正在尝试得到这样的东西。

    id       bins                  
    1      2 -  3        
    2      4 -  5       
    3      6 -  7        
    4      8 -  9       
    5      10 -  11

我的目标是使用正则表达式来实现这一点。恐怕我不是正则表达式的专家。这部分是我尝试过但没有成功的解决方案。

   df['bins'].astype(str).str.replace(']', ' ')
   df['bins'].astype(str).str.replace(',', ' - ')
   df['bins'] = df['bins'].apply(lambda x: x.replace('[','').replace(']',''))

任何帮助将不胜感激！！

提前致谢

Answer 1

你可以使用

df['bins'] = df['bins'].astype(str).str.replace(r'[][()]+', '', regex=True).str.replace(',', ' - ')

注：

.replace(r'[][()]+', '', regex=True) - 删除一个或多个 ]、[、( 和 ) 字符
.str.replace(',', ' - ') - 用 space+-+space.

另一种方式：

df['bins'].astype(str).str.replace(r'\((\d+)\s*,\s*(\d+)]', r' - ', regex=True)

这里，\((\d+)\s*,\s*(\d+)]匹配

\( - 一个 ( 字符
(\d+) - 第 1 组 (</code>)：一位或多位数字</li> <li><code>\s*,\s* - 用零个或多个 whitespaces
(\d+) - 第 2 组 (</code>)：一位或多位数字</li> <li><code>] - 一个 ] 字符。

Pandas 测试：

>>> import pandas as pd
>>> df = pd.DataFrame({'bins':['(2, 3]']})
>>> df['bins'].astype(str).str.replace(r'\((\d+)\s*,\s*(\d+)]', r' - ', regex=True)
0    2 - 3
Name: bins, dtype: object
>>> df['bins'].astype(str).str.replace(r'[][()]+', '', regex=True).str.replace(',', ' - ')
0    2 -  3
Name: bins, dtype: object

Answer 2

我会用 re 做一些不同的事情。寻找数字并将它们连接成一个字符串：

df['bins'] = df['bins'].apply(lambda x: " - ".join(re.findall("(\d+)", x)))

df
   id     bins 
0   1    2 - 3
1   2    4 - 5
2   3    6 - 7
3   4    8 - 9 
4   5  10 - 11

Answer 3

你做到了

   df['bins'].astype(str).str.replace(']', ' ')
   df['bins'].astype(str).str.replace(',', ' - ')

但是 .str.replace 不能原地工作，你应该分配它返回的内容，否则不会对你的 pandas.DataFrame 进行任何更改，简单示例：

import pandas as pd
df = pd.DataFrame({'col1':[100,200,300]})
df['col1'].astype(str).str.replace('100','1000')
print(df)  # there is still 100
df['col1'] = df['col1'].astype(str).str.replace('100','1000')
print(df)  # now there is 1000 rather than 100

正则表达式:: 'pandas._libs.interval.Interval' 对象没有属性 'replace'

Regex:: 'pandas._libs.interval.Interval' object has no attribute 'replace'

python

regex

data-wrangling