Python 读取包含换行符和段落分隔元素的文本文件

Question

我正在尝试将文本文件读入 Python 中的嵌套列表。也就是说，我希望输出为：

[[.79, Breyers Ice Cream, Homemade Vanilla, 48 oz], [.39, Haagen-dazs, Vanilla Bean Ice Cream, 1 pt], etc...]]

最终目标是将信息读入 pandas DataFrame 以进行一些探索性分析。

数据（在 .txt 文件中）

.79  
Breyers Ice Cream  
Homemade Vanilla  
48 oz

.39  
Haagen-dazs  
Vanilla Bean Ice Cream  
1 pt

.89  
So Delicious  
Dairy Free Coconutmilk No Sugar Added Dipped Vanilla Bars  
4 x 2.3 oz

.79  
Popsicle Fruit Pops Mango  
12 ct

我尝试过的

with open(sample.txt) as f:
   creams = f.read()


creams = f.split("\n\n")

然而，这个returns:

['.79\nBreyers Ice Cream\nHomemade Vanilla\n48 oz', '.39\nHaagen-dazs\nVanilla Bean Ice Cream\n1 pt',

我也尝试过使用看起来比上述代码更清晰的列表理解方法，但这些尝试处理的是换行符，而不是段落或 returns。例如：

[x for x in open('<file_name>.txt').read().splitlines()]  
#Gives
['.79', 'Breyers Ice Cream', 'Homemade Vanilla', '48 oz', '', '.39', 'Haagen-dazs', 'Vanilla Bean Ice Cream', '1 pt', '', '

我知道我需要在列表理解中嵌套一个列表，但我不确定如何执行拆分。

注意：这是我发布的第一个问题，对于篇幅过长或不够简洁，我们深表歉意。寻求帮助，因为有类似的问题，但不是我想要的结果。

Answer 1

将四行组分开后，您就快完成了。剩下的就是用一个换行符再次拆分组。

with open('creams.txt','r') as f:
    creams = f.read()

creams = creams.split("\n\n")
creams = [lines.split('\n') for lines in creams]
print(creams)

Answer 2

你只需要再拆分一次就可以了。

with open('sample.txt','r') as file:
    creams = file.read()

creams = creams.split("\n\n")
creams = [lines.split('\n') for lines in creams]

print(creams)
#[['.79  ', 'Breyers Ice Cream  ', 'Homemade Vanilla  ', '48 oz'], ['.39  ', 'Haagen-dazs  ', 'Vanilla Bean Ice Cream  ', '1 pt'], ['.89  ', 'So Delicious  ', 'Dairy Free Coconutmilk No Sugar Added Dipped Vanilla Bars  ', '4 x 2.3 oz'], ['.79  ', 'Popsicle Fruit Pops Mango', '-', '12 ct']]

#Convert to Data
df = pd.DataFrame(creams, columns =['Amnt', 'Brand', 'Flavor', 'Qty'])

      Amnt                      Brand  \
0  .79          Breyers Ice Cream     
1  .39                Haagen-dazs     
2  .89               So Delicious     
3  .79    Popsicle Fruit Pops Mango   

                                              Flavor         Qty  
0                                 Homemade Vanilla         48 oz  
1                           Vanilla Bean Ice Cream          1 pt  
2  Dairy Free Coconutmilk No Sugar Added Dipped V...  4 x 2.3 oz  
3                                                  -       12 ct

注意：我在风味列的最后一行添加了 -，因为它是空的。如果是原始数据集，则在执行任何分析之前必须考虑到这一点。

Python 读取包含换行符和段落分隔元素的文本文件

Python read text file with newline and and paragraph separated elements

python

text-files

readfile

pandas

数据（在 .txt 文件中）

我尝试过的