pandas 数据框不会根据条件更新
The pandas dataframe do not get updated based on a condition
我有一个数据框,我需要根据条件更新列(我正在尝试使用 Microsoft azure API 标记文本,然后将标签保存回原始数据框,以便以后使用我可以计算准确度)。但奇怪的是数据框没有得到更新!!
这是示例代码:
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient
key = "key"
endpoint = "https://endpoint"
text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
df = pd.DataFrame({'id':[1,2,3], 'text': ['im ok', 'you arent ok', 'its fine'],
'Sentiment':['positive', 'negative', 'neutral']})
n = 10
for i in range(0, df.shape[0], n):
result = text_analytics_client.analyze_sentiment(df.iloc[i:i + n].to_dict('records'))
######in case you do not have azure credentials to get this code run, the out of the result is like this:
######[AnalyzeSentimentResult(id=2, sentiment=negative, warnings= [], statistics=None, confidence_scores=SentimentConfidenceScores(positive=0.01, neutral=0.16, negative=0.83), sentences=[SentenceSentiment(text=you arent ok, sentiment=negative, confidence_scores=SentimentConfidenceScores(positive=0.01, neutral=0.16, negative=0.83), length=12, offset=0, mined_opinions=[])], is_error=False), AnalyzeSentimentResult(id=3, sentiment=positive, warnings=[], statistics=None, confidence_scores=SentimentConfidenceScores(positive=0.98, neutral=0.01, negative=0.01), sentences=[SentenceSentiment(text=its fine, sentiment=positive, confidence_scores=SentimentConfidenceScores(positive=0.98, neutral=0.01, negative=0.01), length=8, offset=0, mined_opinions=[])], is_error=False)]
for idx, doc in enumerate(result):
print(doc.sentiment) ##this will print out a value
id_res = result[idx]['id']
#print(id_res) this will print out the correct id
df.loc[df.id == id_res, 'label'] = doc.sentiment
print(df) ### but here when the dataframe is printed the label column is NAN
我搜索并找到了多个链接,例如 , or this。在所有三个示例中,他们都在做与我相同的事情,但我的数据框没有更新,这是我得到的结果:
id text Sentiment label
0 1 im ok positive NaN
1 2 you arent ok negative NaN
2 3 its fine neutral NaN
详情
我添加了一些细节,希望对您有所帮助。正如我在代码中评论的那样 res_result
有一个正确的 id。当我将此 df.loc[df.id == id_res, 'label']
替换为 df.loc[df.id == 1, 'label']
时,它成功更新了那些行,否则它不会更新!!!!
感谢任何有关如何解决此问题的意见。
问题出在这一行:
df.loc[df.id == id_res, 'label'] = doc.sentiment
df.id
是 int 类型,id_res
是 string 类型。如果您将 id_res
转换为 int 那么这将是一个有效的比较,您将得到您正在寻找的输出:
df.loc[df.id == int(id_res), 'label'] = doc.sentiment
输出:
id text Sentiment label
0 1 im ok positive neutral
1 2 you arent ok negative negative
2 3 its fine neutral positive
我有一个数据框,我需要根据条件更新列(我正在尝试使用 Microsoft azure API 标记文本,然后将标签保存回原始数据框,以便以后使用我可以计算准确度)。但奇怪的是数据框没有得到更新!!
这是示例代码:
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient
key = "key"
endpoint = "https://endpoint"
text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
df = pd.DataFrame({'id':[1,2,3], 'text': ['im ok', 'you arent ok', 'its fine'],
'Sentiment':['positive', 'negative', 'neutral']})
n = 10
for i in range(0, df.shape[0], n):
result = text_analytics_client.analyze_sentiment(df.iloc[i:i + n].to_dict('records'))
######in case you do not have azure credentials to get this code run, the out of the result is like this:
######[AnalyzeSentimentResult(id=2, sentiment=negative, warnings= [], statistics=None, confidence_scores=SentimentConfidenceScores(positive=0.01, neutral=0.16, negative=0.83), sentences=[SentenceSentiment(text=you arent ok, sentiment=negative, confidence_scores=SentimentConfidenceScores(positive=0.01, neutral=0.16, negative=0.83), length=12, offset=0, mined_opinions=[])], is_error=False), AnalyzeSentimentResult(id=3, sentiment=positive, warnings=[], statistics=None, confidence_scores=SentimentConfidenceScores(positive=0.98, neutral=0.01, negative=0.01), sentences=[SentenceSentiment(text=its fine, sentiment=positive, confidence_scores=SentimentConfidenceScores(positive=0.98, neutral=0.01, negative=0.01), length=8, offset=0, mined_opinions=[])], is_error=False)]
for idx, doc in enumerate(result):
print(doc.sentiment) ##this will print out a value
id_res = result[idx]['id']
#print(id_res) this will print out the correct id
df.loc[df.id == id_res, 'label'] = doc.sentiment
print(df) ### but here when the dataframe is printed the label column is NAN
我搜索并找到了多个链接,例如
id text Sentiment label
0 1 im ok positive NaN
1 2 you arent ok negative NaN
2 3 its fine neutral NaN
详情
我添加了一些细节,希望对您有所帮助。正如我在代码中评论的那样 res_result
有一个正确的 id。当我将此 df.loc[df.id == id_res, 'label']
替换为 df.loc[df.id == 1, 'label']
时,它成功更新了那些行,否则它不会更新!!!!
感谢任何有关如何解决此问题的意见。
问题出在这一行:
df.loc[df.id == id_res, 'label'] = doc.sentiment
df.id
是 int 类型,id_res
是 string 类型。如果您将 id_res
转换为 int 那么这将是一个有效的比较,您将得到您正在寻找的输出:
df.loc[df.id == int(id_res), 'label'] = doc.sentiment
输出:
id text Sentiment label
0 1 im ok positive neutral
1 2 you arent ok negative negative
2 3 its fine neutral positive