正在加载 Pandas 个带有跳过情绪的数据框

Loading Pandas Dataframe with skipped sentiment

我有这个用于情绪分析的数据集,使用以下代码加载数据:

url = 'https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/amazon_cells_labelled.tsv'
df = pd.read_csv(url, sep='\t', names=["Sentence", "Feeling"])

问题是 DataFrame 与 NaN 成行,但这只是整个句子的一部分。

输出,现在是这样的:

sentence                      feeling
I do not like it.             NaN
I give it a bad score.        0

输出应如下所示:

sentence                                    feeling
I do not like it. I give it a bad score     0

你能帮我根据分数连接或加载数据集吗?

groupbyagg 行之前创建虚拟组:

grp = df['Feeling'].notna().cumsum().shift(fill_value=0)
out = df.groupby(grp).agg({'Sentence': ' '.join, 'Feeling': 'last'})
print(out)

# Output:
                                                  Sentence  Feeling
Feeling                                                            
0        I try not to adjust the volume setting to avoi...      0.0
1                              Good case, Excellent value.      1.0
2        I thought Motorola made reliable products!. Ba...      1.0
3        When I got this item it was larger than I thou...      0.0
4                                        The mic is great.      1.0
...                                                    ...      ...
996      But, it was cheap so not worth the expense or ...      0.0
997      Unfortunately, I needed them soon so i had to ...      0.0
998      The only thing that disappoint me is the infra...      0.0
999      No money back on this one. You can not answer ...      0.0
1000     It's rugged. Well this one is perfect, at the ...      NaN

[1001 rows x 2 columns]