Pandas 应用函数抛出 NotImplementedError
Pandas apply function throws NotImplementedError
我有一个非常基本的 df,我想根据一列的某些正则表达式创建 2 个新列。我创建了一个函数来执行此操作,它返回 2 个值。
def get_value(s):
result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
if len(result) != 2:
return -1, -1
else:
matches = []
for match in result:
matches.append(match[0] + '.' + match[1])
return float(matches[0]), float(matches[1])
当我尝试这个时:data['Test1'], data['Test2'] = zip(*data['mod_data'].apply(get_value))
它抛出一个错误“NotImplementedError:isna is not defined for MultiIndex”,
但如果我把它分成 2 个 diff 函数,它就可以工作。
def get_value1(s):
result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
if len(result) != 2:
return -1
else:
matches = []
for match in result:
matches.append(match[0] + '.' + match[1])
return float(matches[0])
def get_value2(s):
result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
if len(result) != 2:
return -1
else:
matches = []
for match in result:
matches.append(match[0] + '.' + match[1])
return float(matches[1])
data['From'] = data['mod_data'].apply(get_value1)
data['To'] = data['mod_data'].apply(get_value2)
另一件需要注意的事情是在最后抛出错误 NotImplementedError。我在我的 get_value 函数中添加了 print 语句,它在计算完最后一行后被抛出。
编辑:添加了我正在处理的示例 df
test = pd.DataFrame([['A', 'A1', 'Top', '[{"Value":"37.29","ID":"S1234.1","Time":"","EXPTIME_Name":"","Value":"37.01"}]'],
['B', 'B1', 'Bottom', '[{"EXPO=T10;PID=.ABCDE149;"Value":"45.29";RETICLEID=S14G1490Y2;SEQ=5A423002",Value":"56.98"}]']],
columns=['Module', 'line', 'area', 'mod_data'])
期望的结果:
Module line ... From To
0 A A1 ... 37.29 37.01
1 B B1 ... 45.29 56.98
首先,您的正则表达式有点不对劲。将 '(?<=Value":")(\d+)\.(\d+)?(?=")'
更改为 '(?<=Value":")(\d+\.\d+)?(?=")'
,以便完整的浮点数位于一个捕获组中。您将小数点前的部分分成一组,将小数点后的部分分成另一组:
然后,您可以使用str.findall
:
test = pd.DataFrame([['A', 'A1', 'Top', '[{"Value":"37.29","ID":"S1234.1","Time":"","EXPTIME_Name":"","Value":"37.01"}]'],
['B', 'B1', 'Bottom', '[{"EXPO=T10;PID=.ABCDE149;"Value":"45.29";RETICLEID=S14G1490Y2;SEQ=5A423002",Value":"56.98"}]']],
columns=['Module', 'line', 'area', 'mod_data'])
test[['From', 'To']] = test['mod_data'].str.findall('(?<=Value":")(\d+\.\d+)?(?=")')
test
Out[1]:
Module line area mod_data \
0 A A1 Top [{"Value":"37.29","ID":"S1234.1","Time":"","EX...
1 B B1 Bottom [{"EXPO=T10;PID=.ABCDE149;"Value":"45.29";RETI...
From To
0 37.29 37.01
1 45.29 56.98
我有一个非常基本的 df,我想根据一列的某些正则表达式创建 2 个新列。我创建了一个函数来执行此操作,它返回 2 个值。
def get_value(s):
result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
if len(result) != 2:
return -1, -1
else:
matches = []
for match in result:
matches.append(match[0] + '.' + match[1])
return float(matches[0]), float(matches[1])
当我尝试这个时:data['Test1'], data['Test2'] = zip(*data['mod_data'].apply(get_value))
它抛出一个错误“NotImplementedError:isna is not defined for MultiIndex”, 但如果我把它分成 2 个 diff 函数,它就可以工作。
def get_value1(s):
result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
if len(result) != 2:
return -1
else:
matches = []
for match in result:
matches.append(match[0] + '.' + match[1])
return float(matches[0])
def get_value2(s):
result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
if len(result) != 2:
return -1
else:
matches = []
for match in result:
matches.append(match[0] + '.' + match[1])
return float(matches[1])
data['From'] = data['mod_data'].apply(get_value1)
data['To'] = data['mod_data'].apply(get_value2)
另一件需要注意的事情是在最后抛出错误 NotImplementedError。我在我的 get_value 函数中添加了 print 语句,它在计算完最后一行后被抛出。
编辑:添加了我正在处理的示例 df
test = pd.DataFrame([['A', 'A1', 'Top', '[{"Value":"37.29","ID":"S1234.1","Time":"","EXPTIME_Name":"","Value":"37.01"}]'],
['B', 'B1', 'Bottom', '[{"EXPO=T10;PID=.ABCDE149;"Value":"45.29";RETICLEID=S14G1490Y2;SEQ=5A423002",Value":"56.98"}]']],
columns=['Module', 'line', 'area', 'mod_data'])
期望的结果:
Module line ... From To
0 A A1 ... 37.29 37.01
1 B B1 ... 45.29 56.98
首先,您的正则表达式有点不对劲。将 '(?<=Value":")(\d+)\.(\d+)?(?=")'
更改为 '(?<=Value":")(\d+\.\d+)?(?=")'
,以便完整的浮点数位于一个捕获组中。您将小数点前的部分分成一组,将小数点后的部分分成另一组:
然后,您可以使用str.findall
:
test = pd.DataFrame([['A', 'A1', 'Top', '[{"Value":"37.29","ID":"S1234.1","Time":"","EXPTIME_Name":"","Value":"37.01"}]'],
['B', 'B1', 'Bottom', '[{"EXPO=T10;PID=.ABCDE149;"Value":"45.29";RETICLEID=S14G1490Y2;SEQ=5A423002",Value":"56.98"}]']],
columns=['Module', 'line', 'area', 'mod_data'])
test[['From', 'To']] = test['mod_data'].str.findall('(?<=Value":")(\d+\.\d+)?(?=")')
test
Out[1]:
Module line area mod_data \
0 A A1 Top [{"Value":"37.29","ID":"S1234.1","Time":"","EX...
1 B B1 Bottom [{"EXPO=T10;PID=.ABCDE149;"Value":"45.29";RETI...
From To
0 37.29 37.01
1 45.29 56.98