pandas 读取以空行作为分隔符的文件

Question

我有一个以空行作为分隔符的文件。我的文件看起来像这样

A B C

D  F

A K F
G H

123 AB 34
34 GE PQ 56

在上面的格式中，行分隔符是一个空行。如何使用 pandas 读取这样的文件？首先我想，我会使用通常的 read_csv 函数，然后我可以将所有行组合起来，直到一个空行变成一行。但似乎做起来不是很简单。因为检测空行并组合非索引行似乎是不可能的。

有任何解决方法可以解决我的问题吗？我不想明确更改文件的格式，因为文件来自外部提供商并以在线方式处理

Answer 1

将 this solution 与连接列表一起使用并附加到 DataFrame 构造函数：

def per_section(it, is_delimiter=lambda x: x.isspace()):
    ret = []
    for line in it:
        if is_delimiter(line):
            if ret:
                yield ''.join(ret)
                ret = []
        else:
            ret.append(line.rstrip())
    if ret:
        yield ''.join(ret)

with open("data.txt") as f:
    s = list(per_section(f))
    df = pd.DataFrame({'data':s})
    print (df)
                   data
0                 A B C
1                  D  F
2              A K FG H
3  123 AB 3434 GE PQ 56

pandas 读取以空行作为分隔符的文件

pandas read files with blank line as separator

file-io

python-3.x

pandas