Pandas DataFrame 在追加行后丢失索引

Pandas DataFrame lost index after appending row

我创建了一个 DataFrame 并设置了一个索引。如果我通过 append 追加一行,那么索引就会丢失。

import pandas as pd

history = {}
history_cols = {
                "event_time":              "E",
                "close":                   "c",
                "base_volume":             "v",
                "quote_volume":            "q",
                "total_number_of_trades":  "n"
                }

ticks = [
        {'event_time': 1638470651223, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
        {'event_time': 1638470652088, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
        {'event_time': 1638470653224, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
        {'event_time': 1638470654189, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
        {'event_time': 1638470655203, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
        {'event_time': 1638470656201, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917}
        ]

history["AXSBUSD"] = pd.DataFrame(columns=history_cols.keys())
history["AXSBUSD"].set_index("event_time", inplace=True)
history["AXSBUSD"]

空 DataFrame 具有索引:

            close   base_volume     quote_volume    total_number_of_trades
event_time              

现在我用字典追加一行...

history["AXSBUSD"] = history["AXSBUSD"].append(ticks[0], ignore_index=True)
history["AXSBUSD"]

...这是结果:

    close   base_volume     quote_volume    total_number_of_trades  event_time
0   133.41000000    70094.70000000  9415851.87690000    30917   1.638471e+12

有谁知道为什么索引不见了?

与其让它变得如此复杂,不如干脆:

history["AXSBUSD"] = pd.DataFrame(ticks).set_index('event_time')

如果您需要逐行追加,那么您可以这样做:

history["AXSBUSD"] = pd.DataFrame(columns=history_cols.keys())
history["AXSBUSD"].set_index("event_time", inplace=True)

history["AXSBUSD"] = (history["AXSBUSD"]
                      .append(pd.Series(ticks[0])
                              .rename(ticks[0]['event_time'], inplace=True)
                              .drop('event_time')))
print(history["AXSBUSD"])

输出:

                      close     base_volume      quote_volume   total_number_of_trades  
event_time                                                      
1638470651223  133.41000000  70094.70000000  9415851.87690000  30917

仅将字典附加到数据框的主要问题是,不清楚新行的索引应该是什么;这就是为什么你必须输入 ignore_index=True。但是如果你.rename一个pd.Series被追加,它会作为索引。

但是,我认为最好是只附加行而不执行所有操作,并且只 set_index 一旦您真正需要使用数据框:

for tick in ticks:
    history["AXSBUSD"] = history["AXSBUSD"].append(tick, ignore_index=True)
history["AXSBUSD"].set_index('event_time', inplace=True)

不确定这与简单地附加数据框相比效率如何,但这会起作用并满足您的目的:

history["AXSBUSD"] = pd.concat(
    [history["AXSBUSD"], pd.DataFrame([ticks[0]]).set_index("event_time")]
)