将文本文件转换为 Pandas 数据框

Question

我想从文本文件创建数据框。我从一个网站抓取一些数据并将其写入 .txt 文件。如文本文件的前 10 行所示，共有 10 个 'columns'。任何人都可以帮助我以 pandas 数据帧格式将行分隔到相应的列中吗？非常感谢！

下面是文本文件的例子。我希望前 10 行是列名，随后的行在各自的列下。

NFT Collection
Volume (ETH)
Market Cap (ETH)
Max price (ETH)
Avg price (ETH)
Min price (ETH)
% Opensea+Rarible
#Transactions
#Wallets
Contract date
Axies | Axie Infinity
4,884
480,695
5.24
.0563
.0024
0
86,807
2,389,981
189d ago
Sandbox's LANDs
578
112,989
6
1.11
.108
100%
394
12,879
700d ago

Answer 1

更新

直接在循环中填充数据帧应该是最有效的内存方式。此方法还避免一次加载整个文本文件：

txt_file = "path/to/your/file"

COL_COUNT = 10

with open(txt_file, "r") as f:
    col = [next(f).strip() for i in range(COL_COUNT)]
    df = pd.DataFrame(columns=col) 
    i = COL_COUNT
    while line:=f.readline():
        if i % COL_COUNT == 0:
            row = []
        row.append(line.strip())
        if i % COL_COUNT == COL_COUNT - 1:
            df = df.append(pd.DataFrame([row], columns=col))
        i += 1

    df.set_index(col[0], inplace=True) # get rid of row index
    print(df)

输出：

                      Volume (ETH) Market Cap (ETH) Max price (ETH) Avg price (ETH) Min price (ETH) % Opensea+Rarible #Transactions   #Wallets Contract date
NFT Collection
Axies | Axie Infinity        4,884          480,695            5.24           .0563           .0024                 0        86,807  2,389,981      189d ago
Sandbox's LANDs                578          112,989               6            1.11            .108              100%           394     12,879      700d ago

更新 2

列表方法仍然更快，但对于大文件可能会占用更多内存：

txt_file = "path/to/your/file"

COL_COUNT = 10

table = []
with open(txt_file, "r") as f:
    col = [next(f).strip() for i in range(COL_COUNT)]
    i = COL_COUNT
    while line:=f.readline():
        if i % COL_COUNT == 0:
            row = []
        row.append(line.strip())
        if i % COL_COUNT == COL_COUNT - 1:
            table.append(row)
        i += 1

    df = pd.DataFrame(table, columns=col)
    df.set_index(col[0], inplace=True) # get rid of row index
    print(df)

Answer 2

假设您的文本文件名为 foo.txt，首先我们可以为您的数据构建字典，使用：

foo = {}
with open('foo.txt') as f:
    head = [next(f).strip() for x in range(10)]
    for i in range(500):
        foo[i] = [next(f).strip() for x in range(10)]

然后简单地使用from_dict方法创建数据框：

pd.DataFrame.from_dict(foo, columns=head, orient='index')

给你：

    NFT Collection  Volume (ETH)    Market Cap (ETH)    Max price (ETH) Avg price (ETH) Min price (ETH) % Opensea+Rarible   #Transactions   #Wallets    Contract date
0   Axies | Axie Infinity   4,884   480,695 5.24    .0563   .0024   0   86,807  2,389,981   189d ago
1   Sandbox's LANDs 578 112,989 6   1.11    .108    100%    394 12,879  144d ago

Answer 3

像这样：

text = """NFT Collection
Volume (ETH)
Market Cap (ETH)
Max price (ETH)
Avg price (ETH)
Min price (ETH)
% Opensea+Rarible
#Transactions
#Wallets
Contract date
Axies | Axie Infinity
4,884
480,695
5.24
.0563
.0024
0
86,807
2,389,981
189d ago
Sandbox's LANDs
578
112,989
6
1.11
.108
100%
394
12,879
700d ago"""

text = text.split('\n')
text = [text[i:(i+10)] for i in range(0,len(text),10)]
df = pd.DataFrame(text[1:],columns=text[0])

Answer 4

这是另一种变体：

from io import StringIO

with open("input.txt", "r") as file:
    data = [line.strip() for line in file]
data = StringIO("\n".join(";".join(data[i:i+10]) for i in range(0, len(data), 10)))
df = pd.read_csv(data, delimiter=";")

优点：您不必将数字从字符串转换为 int/float 等，pd.read_csv 可以做到。缺点：您必须确保分隔符（在 join 和 pd.read_csv 中使用）是输入中未出现的字符。

将文本文件转换为 Pandas 数据框

Convert Text File into Pandas Dataframe

python

text

pandas

更新

更新 2