正在加载 Pandas 个带有跳过情绪的数据框
Loading Pandas Dataframe with skipped sentiment
我有这个用于情绪分析的数据集,使用以下代码加载数据:
url = 'https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/amazon_cells_labelled.tsv'
df = pd.read_csv(url, sep='\t', names=["Sentence", "Feeling"])
问题是 DataFrame 与 NaN 成行,但这只是整个句子的一部分。
输出,现在是这样的:
sentence feeling
I do not like it. NaN
I give it a bad score. 0
输出应如下所示:
sentence feeling
I do not like it. I give it a bad score 0
你能帮我根据分数连接或加载数据集吗?
在 groupby
和 agg
行之前创建虚拟组:
grp = df['Feeling'].notna().cumsum().shift(fill_value=0)
out = df.groupby(grp).agg({'Sentence': ' '.join, 'Feeling': 'last'})
print(out)
# Output:
Sentence Feeling
Feeling
0 I try not to adjust the volume setting to avoi... 0.0
1 Good case, Excellent value. 1.0
2 I thought Motorola made reliable products!. Ba... 1.0
3 When I got this item it was larger than I thou... 0.0
4 The mic is great. 1.0
... ... ...
996 But, it was cheap so not worth the expense or ... 0.0
997 Unfortunately, I needed them soon so i had to ... 0.0
998 The only thing that disappoint me is the infra... 0.0
999 No money back on this one. You can not answer ... 0.0
1000 It's rugged. Well this one is perfect, at the ... NaN
[1001 rows x 2 columns]
我有这个用于情绪分析的数据集,使用以下代码加载数据:
url = 'https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/amazon_cells_labelled.tsv'
df = pd.read_csv(url, sep='\t', names=["Sentence", "Feeling"])
问题是 DataFrame 与 NaN 成行,但这只是整个句子的一部分。
输出,现在是这样的:
sentence feeling
I do not like it. NaN
I give it a bad score. 0
输出应如下所示:
sentence feeling
I do not like it. I give it a bad score 0
你能帮我根据分数连接或加载数据集吗?
在 groupby
和 agg
行之前创建虚拟组:
grp = df['Feeling'].notna().cumsum().shift(fill_value=0)
out = df.groupby(grp).agg({'Sentence': ' '.join, 'Feeling': 'last'})
print(out)
# Output:
Sentence Feeling
Feeling
0 I try not to adjust the volume setting to avoi... 0.0
1 Good case, Excellent value. 1.0
2 I thought Motorola made reliable products!. Ba... 1.0
3 When I got this item it was larger than I thou... 0.0
4 The mic is great. 1.0
... ... ...
996 But, it was cheap so not worth the expense or ... 0.0
997 Unfortunately, I needed them soon so i had to ... 0.0
998 The only thing that disappoint me is the infra... 0.0
999 No money back on this one. You can not answer ... 0.0
1000 It's rugged. Well this one is perfect, at the ... NaN
[1001 rows x 2 columns]