遍历行时如何处理来自 IBM Watson 的错误
How to handle errors from IBM Watson when iterating over rows
我是一名学生,正在从事一个项目,该项目使用 IBM Watson 的 NLU 来解析各种新闻文章和 return 情绪评分。我在 table 中有文章,我设置了一个循环来遍历第一列中的每个单元格,对其进行分析、规范化,并将新数据附加到 table。
masterdf = pd.DataFrame()
for index, row in df.iterrows():
text2 = row['CONTENT']
response = natural_language_understanding.analyze(
text = text2,
features=Features(sentiment=SentimentOptions(targets=["Irish",]))).get_result()
json_tbl = pd.json_normalize(response['sentiment'],
record_path='targets',
meta=[['document','score'], ['document','label']])
json_tbl = json_tbl.set_index([pd.Index([index])])
print(json_tbl.head())
masterdf = masterdf.append(json_tbl)
masterdf = pd.concat([df, masterdf], axis=1)
masterdf.head()
我遇到的问题是,有时我所针对的实体不在我正在分析的文章中,因此 IBM 会抛出错误。这完全破坏了我的代码。我想做的是,每当 IBM return 出现错误时,我的代码只用“N/A”填充该行,然后前进到它下面的下一个单元格。我真的是一个初学者,所以任何帮助都会非常感激。
我建议创建一个单独的函数来封装所有情绪分析逻辑。最后,你会这样称呼它:
df['SENTIMENT_SCORE'] = df['CONTENT'].apply(safe_complex_function)
safe_complex_funtion
将是您全新的安全功能。给它起你想要的名字。大概是这样的:
def sentiment_scores(content):
try:
response = natural_language_understanding.analyze(
text=content,
features=Features(
sentiment=SentimentOptions(targets=["Irish",])
)
).get_result()
json_tbl = pd.json_normalize(
response['sentiment'],
record_path='targets',
meta=[['document','score'], ['document','label']]
)
return json_tbl.set_index([pd.Index([index])])
except <The specific Exception you want to deal>: # please don't put Exception. It is too general
return None
这是一个示例代码:
创建测试数据框
import pandas as pd
data = [
(1, 'I am happy'),
(2, 'I am sad'),
(3, 'I am neutral'),
(4, 'Exception generator')
]
df = pd.DataFrame(data,columns=['USER_ID','CONTENT'])
USER_ID
CONTENT
0
1
I am happy
1
2
I am sad
2
3
I am neutral
3
4
Exception generator
创建模拟情绪分析函数
此函数仅用于模拟。
def fake_sentiment_analysis(content):
sentiment_scores = {
'sad': -1,
'happy': 1,
'neutral': 0
}
for sentiment, score in sentiment_scores.items():
if sentiment in content:
return score
## rasises KeyError error only for demonstration purposes
return sentiment_scores['BROKEN']
def complex_function(element):
sentiment_score = fake_sentiment_analysis(element)
return sentiment_score
在 DataFrame 上应用那个非安全函数
你会KeyError
调用那个函数
df['CONTENT'].apply(complex_function)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-22-d418b58879b8> in <module>()
----> 1 df['CONTENT'].apply(complex_function)
2 frames
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-12-538de52b436a> in fake_sentiment_analysis(content)
9 return score
10 ## rasises KeyError error only for demonstration purposes
---> 11 return sentiment_scores['BROKEN']
KeyError: 'BROKEN'
添加异常处理程序
您可以添加异常处理使其更安全
def safe_complex_function(element):
try:
sentiment_score = fake_sentiment_analysis(element)
except KeyError:
sentiment_score = None
return sentiment_score
USER_ID
CONTENT
SENTIMENT_SCORE
0
1
I am happy
1
1
2
I am sad
-1
2
3
I am neutral
0
3
4
Exception generator
nan
我是一名学生,正在从事一个项目,该项目使用 IBM Watson 的 NLU 来解析各种新闻文章和 return 情绪评分。我在 table 中有文章,我设置了一个循环来遍历第一列中的每个单元格,对其进行分析、规范化,并将新数据附加到 table。
masterdf = pd.DataFrame()
for index, row in df.iterrows():
text2 = row['CONTENT']
response = natural_language_understanding.analyze(
text = text2,
features=Features(sentiment=SentimentOptions(targets=["Irish",]))).get_result()
json_tbl = pd.json_normalize(response['sentiment'],
record_path='targets',
meta=[['document','score'], ['document','label']])
json_tbl = json_tbl.set_index([pd.Index([index])])
print(json_tbl.head())
masterdf = masterdf.append(json_tbl)
masterdf = pd.concat([df, masterdf], axis=1)
masterdf.head()
我遇到的问题是,有时我所针对的实体不在我正在分析的文章中,因此 IBM 会抛出错误。这完全破坏了我的代码。我想做的是,每当 IBM return 出现错误时,我的代码只用“N/A”填充该行,然后前进到它下面的下一个单元格。我真的是一个初学者,所以任何帮助都会非常感激。
我建议创建一个单独的函数来封装所有情绪分析逻辑。最后,你会这样称呼它:
df['SENTIMENT_SCORE'] = df['CONTENT'].apply(safe_complex_function)
safe_complex_funtion
将是您全新的安全功能。给它起你想要的名字。大概是这样的:
def sentiment_scores(content):
try:
response = natural_language_understanding.analyze(
text=content,
features=Features(
sentiment=SentimentOptions(targets=["Irish",])
)
).get_result()
json_tbl = pd.json_normalize(
response['sentiment'],
record_path='targets',
meta=[['document','score'], ['document','label']]
)
return json_tbl.set_index([pd.Index([index])])
except <The specific Exception you want to deal>: # please don't put Exception. It is too general
return None
这是一个示例代码:
创建测试数据框
import pandas as pd
data = [
(1, 'I am happy'),
(2, 'I am sad'),
(3, 'I am neutral'),
(4, 'Exception generator')
]
df = pd.DataFrame(data,columns=['USER_ID','CONTENT'])
USER_ID | CONTENT | |
---|---|---|
0 | 1 | I am happy |
1 | 2 | I am sad |
2 | 3 | I am neutral |
3 | 4 | Exception generator |
创建模拟情绪分析函数
此函数仅用于模拟。
def fake_sentiment_analysis(content):
sentiment_scores = {
'sad': -1,
'happy': 1,
'neutral': 0
}
for sentiment, score in sentiment_scores.items():
if sentiment in content:
return score
## rasises KeyError error only for demonstration purposes
return sentiment_scores['BROKEN']
def complex_function(element):
sentiment_score = fake_sentiment_analysis(element)
return sentiment_score
在 DataFrame 上应用那个非安全函数
你会KeyError
调用那个函数
df['CONTENT'].apply(complex_function)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-22-d418b58879b8> in <module>()
----> 1 df['CONTENT'].apply(complex_function)
2 frames
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-12-538de52b436a> in fake_sentiment_analysis(content)
9 return score
10 ## rasises KeyError error only for demonstration purposes
---> 11 return sentiment_scores['BROKEN']
KeyError: 'BROKEN'
添加异常处理程序
您可以添加异常处理使其更安全
def safe_complex_function(element):
try:
sentiment_score = fake_sentiment_analysis(element)
except KeyError:
sentiment_score = None
return sentiment_score
USER_ID | CONTENT | SENTIMENT_SCORE | |
---|---|---|---|
0 | 1 | I am happy | 1 |
1 | 2 | I am sad | -1 |
2 | 3 | I am neutral | 0 |
3 | 4 | Exception generator | nan |