如何将文本文件中的原始数据加载到 pandas 数据框中?

how to load raw data in a text file in to pandas dataframe?

我的数据在文本文件中,格式如下:

heading1:blah

heading2:废话

heading3:blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah (text entered new line for heading3仅此行)


heading1:blah

heading2:blah

heading3:blah blah blah blah blah blah blah blah blah blah

等等...

:

感谢您 post 将 link 添加到数据中。如果它是公开的,那么一开始就这样做是有帮助的。我 运行 这个在完整的数据集上;在一台像样的笔记本电脑上花了几秒钟。

import numpy as np
import pandas as pd

with open('rfa_all.NL-SEPARATED.txt', 'r') as f:
    data = f.readlines()

# create a dictionary with keys and lists.
# if you don't set the values as lists, you get an error.
d = {'SRC': [], 'TGT': [], 'VOT': [],  'RES': [],  'YEA': [],  'DAT': [],  'TXT': []}

for line in data: # go through file line by line
    if line != '\n': # skip new line characters
        line = line.replace('\n', '') # get rid of '\n' in all fields
        key, val = line.split(':', 1) # take the first 2 tokens from the split statement
        d[key].append(val)

df = pd.DataFrame(d)
df

来自此 post 的广泛帮助:

我确信有一种更快的设置方法,但我认为这会起作用。