Pandas DataFrame 在追加行后丢失索引

Question

我创建了一个 DataFrame 并设置了一个索引。如果我通过 append 追加一行，那么索引就会丢失。

import pandas as pd

history = {}
history_cols = {
                "event_time":              "E",
                "close":                   "c",
                "base_volume":             "v",
                "quote_volume":            "q",
                "total_number_of_trades":  "n"
                }

ticks = [
        {'event_time': 1638470651223, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
        {'event_time': 1638470652088, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
        {'event_time': 1638470653224, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
        {'event_time': 1638470654189, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
        {'event_time': 1638470655203, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
        {'event_time': 1638470656201, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917}
        ]

history["AXSBUSD"] = pd.DataFrame(columns=history_cols.keys())
history["AXSBUSD"].set_index("event_time", inplace=True)
history["AXSBUSD"]

空 DataFrame 具有索引：

            close   base_volume     quote_volume    total_number_of_trades
event_time

现在我用字典追加一行...

history["AXSBUSD"] = history["AXSBUSD"].append(ticks[0], ignore_index=True)
history["AXSBUSD"]

...这是结果：

    close   base_volume     quote_volume    total_number_of_trades  event_time
0   133.41000000    70094.70000000  9415851.87690000    30917   1.638471e+12

有谁知道为什么索引不见了？

Answer 1

与其让它变得如此复杂，不如干脆：

history["AXSBUSD"] = pd.DataFrame(ticks).set_index('event_time')

如果您需要逐行追加，那么您可以这样做：

history["AXSBUSD"] = pd.DataFrame(columns=history_cols.keys())
history["AXSBUSD"].set_index("event_time", inplace=True)

history["AXSBUSD"] = (history["AXSBUSD"]
                      .append(pd.Series(ticks[0])
                              .rename(ticks[0]['event_time'], inplace=True)
                              .drop('event_time')))
print(history["AXSBUSD"])

输出：

                      close     base_volume      quote_volume   total_number_of_trades  
event_time                                                      
1638470651223  133.41000000  70094.70000000  9415851.87690000  30917

仅将字典附加到数据框的主要问题是，不清楚新行的索引应该是什么；这就是为什么你必须输入 ignore_index=True。但是如果你.rename一个pd.Series被追加，它会作为索引。

但是，我认为最好是只附加行而不执行所有操作，并且只 set_index 一旦您真正需要使用数据框：

for tick in ticks:
    history["AXSBUSD"] = history["AXSBUSD"].append(tick, ignore_index=True)
history["AXSBUSD"].set_index('event_time', inplace=True)

Answer 2

不确定这与简单地附加数据框相比效率如何，但这会起作用并满足您的目的：

history["AXSBUSD"] = pd.concat(
    [history["AXSBUSD"], pd.DataFrame([ticks[0]]).set_index("event_time")]
)

Pandas DataFrame 在追加行后丢失索引

Pandas DataFrame lost index after appending row

python

indexing

append

dataframe

pandas