如何添加由 pd.Timestamp 组成的行并在 Dataframe 中浮动

How could I add a row composed of pd.Timestamp and float in a Dataframe

我正在尝试使用以下代码将包含一些 pandas 时间戳和一些浮点值的行附加到数据帧

pair_columns = ['T1 Time', 'T1 Active', 'T1 Reactive', 'T2 Time', 'T2 Active', 'T2 Reactive']

# an empty dataframe
matched_pairs = pd.DataFrame(columns=pair_columns)


# A list with some Timestamp
value_with_timestamp = [pd.Timestamp('2011-10-21 20:08:42+0000', tz='UTC'), 21.847724815467735, -78.998453511820344, pd.Timestamp('2011-10-21 20:08:54+0000', tz='UTC'), -74.608437575303114, 48.537725275212779]
ser_timestamp = pd.Series(value_with_timestamp)


# This pass, but the dataframe get a row containing only NaN
matched_pairs.loc[len(matched_pairs)] = ser_timestamp
print("Dataframe with series containing timestamp")
print(matched_pairs.head())

# Exception TypeError: data type not understood
matched_pairs.loc[len(matched_pairs)] = value_with_timestamp
print(matched_pairs.head())

# Exception TypeError: data type not understood
matched_pairs = matched_pairs.append(ser_timestamp, ignore_index=True)
print(matched_pairs.head())

此代码无效,但使用字符串而不是时间戳,一切正常

import pandas as pd

matched_pairs_string = pd.DataFrame(columns=pair_columns)

# The same list but with string instend of timestamp
value_string = ['2011-10-21 20:08:42+0000', 21.847724815467735, -78.998453511820344, '2011-10-21 20:08:54+0000', -74.608437575303114, 48.537725275212779]

# Add the list with the string to the dataframe, this work like a charm
matched_pairs_string.loc[len(matched_pairs_string)] = value_string
print("Dataframe with string instead of timestamp")
print(matched_pairs_string.head())

我做错了什么?有没有办法完成我想要的?我只想按原样将此数据添加为一行,而不是将时间戳转换为另一种类型?

从技术上讲,问题不在于时间戳,而是您分配给行的对象类型:系列(您在第一个代码块中尝试)与list(您在第二个代码块中尝试)。

由于 pandas DataFrame 中的每一列都是 一个 pandas 系列,您不能将行分配给系列。考虑使用 series.tolist() 转换为行分配列表或使用原始列表:

matched_pairs.loc[len(matched_pairs)] = ser_timestamp.tolist()
#               T1 Time  T1 Active  T1 Reactive             T2 Time  T2 Active  T2 Reactive
# 0 2011-10-21 20:08:42  21.847725   -78.998454 2011-10-21 20:08:54 -74.608438     48.53772

matched_pairs.loc[len(matched_pairs)] = value_with_timestamp
#               T1 Time  T1 Active  T1 Reactive             T2 Time  T2 Active  T2 Reactive
# 0 2011-10-21 20:08:42  21.847725   -78.998454 2011-10-21 20:08:54 -74.608438     48.53772

在这样做的过程中,您分配了正确的数据类型:

print(matched_pairs.dtypes)

# T1 Time        datetime64[ns]
# T1 Active             float64
# T1 Reactive           float64
# T2 Time        datetime64[ns]
# T2 Active             float64
# T2 Reactive           float64
# dtype: object

正如 OP 所指出的,可能存在版本问题,其中 pandas 0.19 中的上述内容抛出异常:

TypeError: data type not understood

一种可能的解决方案是在行分配之前在空数据帧上显式定义数据类型(时间戳和浮点数)。由于没有单个 dtype() 调用,循环是 运行 转换每一列:

pair_columns = ['T1 Time', 'T1 Active', 'T1 Reactive', 'T2 Time', 'T2 Active', 'T2 Reactive']
pair_dtypes = ['M8[ms]', 'float', 'float', 'M8[ms]', 'float', 'float']

# an empty dataframe
matched_pairs = pd.DataFrame(columns=pair_columns)
datatypes = {k:v for k,v in zip(pair_columns, pair_dtypes)}

for k,v in datatypes.items():
    matched_pairs[k] = matched_pairs[k].astype(v)

...
matched_pairs.loc[len(matched_pairs)] = ser_timestamp.tolist()
# matched_pairs.loc[len(matched_pairs)] = value_with_timestamp