Python + dataframe : AttributeError: 'float' object has no attribute 'replace'
Python + dataframe : AttributeError: 'float' object has no attribute 'replace'
我正在尝试编写一个函数来对 Pandas 数据框的指定列(描述,event_name)进行一些文本处理。
我写了这段代码:
#removal of unreadable chars, unwanted spaces, words of at most length two from 'description' column and lowercase the 'description' column
def data_preprocessing(source):
return source.replace('[^A-Za-z]',' ')
#data['description'] = data['description'].str.replace('\W+',' ')
return source.lower()
return source.replace("\s\s+" , " ")
return source.replace('\s+[a-z]{1,2}(?!\S)',' ')
return source.replace("\s\s+" , " ")
data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
出现以下错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-94-cb5ec147833f> in <module>()
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
3
4 #df['words']=df['words'].apply(lambda row: eliminate_space(row))
5
~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
2549 else:
2550 values = self.asobject
-> 2551 mapped = lib.map_infer(values, f, convert=convert_dtype)
2552
2553 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-94-cb5ec147833f> in <lambda>(row)
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
data['description'] = data['description'].str.replace('\W+',' ')
<ipython-input-93-fdfec5f52a06> in data_preprocessing(source)
3 def data_preprocessing(source):
4
----> 5 return source.replace('[^A-Za-z]',' ')
6 #data['description'] = data['description'].str.replace('\W+',' ')
7 source = source.lower()
AttributeError: 'float' object has no attribute 'replace'
如果我按照下面的方式编写代码,没有函数,它可以完美运行:
data['description'] = data['description'].str.replace('[^A-Za-z]',' ')
需要解决两件事:
首先,当你apply
一个lambda函数到pandas系列时,lambda函数被应用到每个元素 系列。我认为您需要的是以矢量化方式将您的函数应用于整个系列。
第二个,你的函数有多个return语句。结果,只有第一个语句 return source.replace('[^A-Za-z]',' ')
永远是 运行。您需要做的是在函数内部对变量 source
进行预处理更改,最后 return 修改后的 source
(或中间变量)在最后。
要重写您的函数以对整个 pandas 系列进行操作,请将每次出现的 source.
替换为 source.str.
。新函数定义:
def data_preprocessing(source):
source = source.str.replace('[^A-Za-z]',' ')
#data['description'] = data['description'].str.replace('\W+',' ')
source = source.str.lower()
source = source.str.replace("\s\s+" , " ")
source = source.str.replace('\s+[a-z]{1,2}(?!\S)',' ')
source = source.str.replace("\s\s+" , " ")
return source
然后,代替这个:
data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
试试这个:
data['description'] = data_preprocessing(data['description'])
data['event_name'] = data_preprocessing(data['event_name'])
我正在尝试编写一个函数来对 Pandas 数据框的指定列(描述,event_name)进行一些文本处理。 我写了这段代码:
#removal of unreadable chars, unwanted spaces, words of at most length two from 'description' column and lowercase the 'description' column
def data_preprocessing(source):
return source.replace('[^A-Za-z]',' ')
#data['description'] = data['description'].str.replace('\W+',' ')
return source.lower()
return source.replace("\s\s+" , " ")
return source.replace('\s+[a-z]{1,2}(?!\S)',' ')
return source.replace("\s\s+" , " ")
data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
出现以下错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-94-cb5ec147833f> in <module>()
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
3
4 #df['words']=df['words'].apply(lambda row: eliminate_space(row))
5
~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
2549 else:
2550 values = self.asobject
-> 2551 mapped = lib.map_infer(values, f, convert=convert_dtype)
2552
2553 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-94-cb5ec147833f> in <lambda>(row)
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
data['description'] = data['description'].str.replace('\W+',' ')
<ipython-input-93-fdfec5f52a06> in data_preprocessing(source)
3 def data_preprocessing(source):
4
----> 5 return source.replace('[^A-Za-z]',' ')
6 #data['description'] = data['description'].str.replace('\W+',' ')
7 source = source.lower()
AttributeError: 'float' object has no attribute 'replace'
如果我按照下面的方式编写代码,没有函数,它可以完美运行:
data['description'] = data['description'].str.replace('[^A-Za-z]',' ')
需要解决两件事:
首先,当你apply
一个lambda函数到pandas系列时,lambda函数被应用到每个元素 系列。我认为您需要的是以矢量化方式将您的函数应用于整个系列。
第二个,你的函数有多个return语句。结果,只有第一个语句 return source.replace('[^A-Za-z]',' ')
永远是 运行。您需要做的是在函数内部对变量 source
进行预处理更改,最后 return 修改后的 source
(或中间变量)在最后。
要重写您的函数以对整个 pandas 系列进行操作,请将每次出现的 source.
替换为 source.str.
。新函数定义:
def data_preprocessing(source):
source = source.str.replace('[^A-Za-z]',' ')
#data['description'] = data['description'].str.replace('\W+',' ')
source = source.str.lower()
source = source.str.replace("\s\s+" , " ")
source = source.str.replace('\s+[a-z]{1,2}(?!\S)',' ')
source = source.str.replace("\s\s+" , " ")
return source
然后,代替这个:
data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
试试这个:
data['description'] = data_preprocessing(data['description'])
data['event_name'] = data_preprocessing(data['event_name'])