如何遍历数据框并获取现有文本(转录本)的极性分数,以便在 python 中每个 id 有 1 行?
How do I traverse through a dataframe and get polarity score of existing text(transcript) so I have 1 row per id in python?
我可以使用我的脚本遍历目录中的文件,但是当所有转录都在 table/dataframe 中时无法应用相同的逻辑。我之前的脚本 -
import os
from glob import glob
import pandas as pd
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
files = glob('C:/Users/jj/Desktop/Bulk_Wav_Completed_CancelsvsSaves/*.csv')
sid = SentimentIntensityAnalyzer()
# use dict comprehension to apply you analysis
data = {os.path.basename(file): sid.polarity_scores(' '.join(pd.read_csv(file, encoding="utf-8")['transcript'])) for file in files}
# create a data frame from the dictionary above
df = pd.DataFrame.from_dict(data, orient='index')
df.to_csv("sentimentcancelvssaves.csv")
如何将上面的应用到下面的 table where
dfo
Out[52]:
InteractionId Agent Transcript
0 100392327420210105 David Michel hi how are you
1 100392327420210105 David Michel yes i am not fine
2 100390719220210104 Mindy Campbell .,xyz..
3 100390719220210104 Mindy Campbell no
4 100390719220210104 Mindy Campbell maybe
... ... ... ... ...
93407 300390890320200915 Sandra Yacklin ...
93408 300390890320200915 Sandra Yacklin ...
93409 300390890320200915 Sandra Yacklin ...
正如您在此处看到的,我有一个唯一的列交互 ID。我的最终数据集为每个 ID 提供 1 行,我需要获得附加到该 ID 的情绪的极性分数。
100390719220210104 的期望输出 -
InteractionId Agent Transcript Positive Compound
2 100390719220210104 Mindy Campbell xyz no maybe 0.190 0.5457
如何为所有交互 ID 执行此操作?当我必须将我的脚本应用于目录中的所有 csvs 副本并遍历它们时,我能够做到这一点。但是,我如何将它应用于所有数据都在一个地方而不是不同的 csvs
的数据框
因此,您不是循环遍历文件,而是循环遍历唯一的 InteractionId。您可以使用:for interaction_id in dfo['InteractionId'].unique()
然后您将加入该 ID 的该列中的值,您可以通过以下方式获得:
' '.join(dfo[dfo['InteractionId'] == interaction_id]['Transcript'])
把它放在一起你有:
import os
from glob import glob
import nltk
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer
dfo = pd.DataFrame(
data={
'InteractionId': [
100392327420210105,
100390719220210104,
100390719220210104,
100390719220210104,
],
'Transcript': ['hi how are you', '.,xyz..', 'no', 'maybe'],
}
)
sid = SentimentIntensityAnalyzer()
# use dict comprehension to apply you analysis
data = {
interaction_id: sid.polarity_scores(
' '.join(dfo[dfo['InteractionId'] == interaction_id]['Transcript'])
)
for interaction_id in dfo['InteractionId'].unique()
}
# create a data frame from the dictionary above
df = pd.DataFrame.from_dict(data, orient='index')
df.to_csv("sentimentcancelvssaves.csv")
我可以使用我的脚本遍历目录中的文件,但是当所有转录都在 table/dataframe 中时无法应用相同的逻辑。我之前的脚本 -
import os
from glob import glob
import pandas as pd
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
files = glob('C:/Users/jj/Desktop/Bulk_Wav_Completed_CancelsvsSaves/*.csv')
sid = SentimentIntensityAnalyzer()
# use dict comprehension to apply you analysis
data = {os.path.basename(file): sid.polarity_scores(' '.join(pd.read_csv(file, encoding="utf-8")['transcript'])) for file in files}
# create a data frame from the dictionary above
df = pd.DataFrame.from_dict(data, orient='index')
df.to_csv("sentimentcancelvssaves.csv")
如何将上面的应用到下面的 table where
dfo
Out[52]:
InteractionId Agent Transcript
0 100392327420210105 David Michel hi how are you
1 100392327420210105 David Michel yes i am not fine
2 100390719220210104 Mindy Campbell .,xyz..
3 100390719220210104 Mindy Campbell no
4 100390719220210104 Mindy Campbell maybe
... ... ... ... ...
93407 300390890320200915 Sandra Yacklin ...
93408 300390890320200915 Sandra Yacklin ...
93409 300390890320200915 Sandra Yacklin ...
正如您在此处看到的,我有一个唯一的列交互 ID。我的最终数据集为每个 ID 提供 1 行,我需要获得附加到该 ID 的情绪的极性分数。
100390719220210104 的期望输出 -
InteractionId Agent Transcript Positive Compound
2 100390719220210104 Mindy Campbell xyz no maybe 0.190 0.5457
如何为所有交互 ID 执行此操作?当我必须将我的脚本应用于目录中的所有 csvs 副本并遍历它们时,我能够做到这一点。但是,我如何将它应用于所有数据都在一个地方而不是不同的 csvs
的数据框因此,您不是循环遍历文件,而是循环遍历唯一的 InteractionId。您可以使用:for interaction_id in dfo['InteractionId'].unique()
然后您将加入该 ID 的该列中的值,您可以通过以下方式获得:
' '.join(dfo[dfo['InteractionId'] == interaction_id]['Transcript'])
把它放在一起你有:
import os
from glob import glob
import nltk
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer
dfo = pd.DataFrame(
data={
'InteractionId': [
100392327420210105,
100390719220210104,
100390719220210104,
100390719220210104,
],
'Transcript': ['hi how are you', '.,xyz..', 'no', 'maybe'],
}
)
sid = SentimentIntensityAnalyzer()
# use dict comprehension to apply you analysis
data = {
interaction_id: sid.polarity_scores(
' '.join(dfo[dfo['InteractionId'] == interaction_id]['Transcript'])
)
for interaction_id in dfo['InteractionId'].unique()
}
# create a data frame from the dictionary above
df = pd.DataFrame.from_dict(data, orient='index')
df.to_csv("sentimentcancelvssaves.csv")