Pandas DataFrame 在追加行后丢失索引
Pandas DataFrame lost index after appending row
我创建了一个 DataFrame 并设置了一个索引。如果我通过 append 追加一行,那么索引就会丢失。
import pandas as pd
history = {}
history_cols = {
"event_time": "E",
"close": "c",
"base_volume": "v",
"quote_volume": "q",
"total_number_of_trades": "n"
}
ticks = [
{'event_time': 1638470651223, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
{'event_time': 1638470652088, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
{'event_time': 1638470653224, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
{'event_time': 1638470654189, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
{'event_time': 1638470655203, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
{'event_time': 1638470656201, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917}
]
history["AXSBUSD"] = pd.DataFrame(columns=history_cols.keys())
history["AXSBUSD"].set_index("event_time", inplace=True)
history["AXSBUSD"]
空 DataFrame 具有索引:
close base_volume quote_volume total_number_of_trades
event_time
现在我用字典追加一行...
history["AXSBUSD"] = history["AXSBUSD"].append(ticks[0], ignore_index=True)
history["AXSBUSD"]
...这是结果:
close base_volume quote_volume total_number_of_trades event_time
0 133.41000000 70094.70000000 9415851.87690000 30917 1.638471e+12
有谁知道为什么索引不见了?
与其让它变得如此复杂,不如干脆:
history["AXSBUSD"] = pd.DataFrame(ticks).set_index('event_time')
如果您需要逐行追加,那么您可以这样做:
history["AXSBUSD"] = pd.DataFrame(columns=history_cols.keys())
history["AXSBUSD"].set_index("event_time", inplace=True)
history["AXSBUSD"] = (history["AXSBUSD"]
.append(pd.Series(ticks[0])
.rename(ticks[0]['event_time'], inplace=True)
.drop('event_time')))
print(history["AXSBUSD"])
输出:
close base_volume quote_volume total_number_of_trades
event_time
1638470651223 133.41000000 70094.70000000 9415851.87690000 30917
仅将字典附加到数据框的主要问题是,不清楚新行的索引应该是什么;这就是为什么你必须输入 ignore_index=True
。但是如果你.rename
一个pd.Series
被追加,它会作为索引。
但是,我认为最好是只附加行而不执行所有操作,并且只 set_index
一旦您真正需要使用数据框:
for tick in ticks:
history["AXSBUSD"] = history["AXSBUSD"].append(tick, ignore_index=True)
history["AXSBUSD"].set_index('event_time', inplace=True)
不确定这与简单地附加数据框相比效率如何,但这会起作用并满足您的目的:
history["AXSBUSD"] = pd.concat(
[history["AXSBUSD"], pd.DataFrame([ticks[0]]).set_index("event_time")]
)
我创建了一个 DataFrame 并设置了一个索引。如果我通过 append 追加一行,那么索引就会丢失。
import pandas as pd
history = {}
history_cols = {
"event_time": "E",
"close": "c",
"base_volume": "v",
"quote_volume": "q",
"total_number_of_trades": "n"
}
ticks = [
{'event_time': 1638470651223, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
{'event_time': 1638470652088, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
{'event_time': 1638470653224, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
{'event_time': 1638470654189, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
{'event_time': 1638470655203, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917},
{'event_time': 1638470656201, 'close': '133.41000000', 'base_volume': '70094.70000000', 'quote_volume': '9415851.87690000', 'total_number_of_trades': 30917}
]
history["AXSBUSD"] = pd.DataFrame(columns=history_cols.keys())
history["AXSBUSD"].set_index("event_time", inplace=True)
history["AXSBUSD"]
空 DataFrame 具有索引:
close base_volume quote_volume total_number_of_trades
event_time
现在我用字典追加一行...
history["AXSBUSD"] = history["AXSBUSD"].append(ticks[0], ignore_index=True)
history["AXSBUSD"]
...这是结果:
close base_volume quote_volume total_number_of_trades event_time
0 133.41000000 70094.70000000 9415851.87690000 30917 1.638471e+12
有谁知道为什么索引不见了?
与其让它变得如此复杂,不如干脆:
history["AXSBUSD"] = pd.DataFrame(ticks).set_index('event_time')
如果您需要逐行追加,那么您可以这样做:
history["AXSBUSD"] = pd.DataFrame(columns=history_cols.keys())
history["AXSBUSD"].set_index("event_time", inplace=True)
history["AXSBUSD"] = (history["AXSBUSD"]
.append(pd.Series(ticks[0])
.rename(ticks[0]['event_time'], inplace=True)
.drop('event_time')))
print(history["AXSBUSD"])
输出:
close base_volume quote_volume total_number_of_trades
event_time
1638470651223 133.41000000 70094.70000000 9415851.87690000 30917
仅将字典附加到数据框的主要问题是,不清楚新行的索引应该是什么;这就是为什么你必须输入 ignore_index=True
。但是如果你.rename
一个pd.Series
被追加,它会作为索引。
但是,我认为最好是只附加行而不执行所有操作,并且只 set_index
一旦您真正需要使用数据框:
for tick in ticks:
history["AXSBUSD"] = history["AXSBUSD"].append(tick, ignore_index=True)
history["AXSBUSD"].set_index('event_time', inplace=True)
不确定这与简单地附加数据框相比效率如何,但这会起作用并满足您的目的:
history["AXSBUSD"] = pd.concat(
[history["AXSBUSD"], pd.DataFrame([ticks[0]]).set_index("event_time")]
)