Python

Question

我有一个数据框，其中索引是 datetimeindex，每一行都是几年中的每一天。我需要按月对数据帧重新采样，其中两个浮点列相加，但字符串列都是该月的唯一值。我可以对单个列进行重采样，但我不知道如何对所有内容进行重采样，或者如果我一次只做一个，我不知道如何将它们重新组合在一起。

对于我尝试的花车：

# go through the column list
for col in col_list:
    # process all run time columns for month
    if "float" in str(col):
        # resample for one month and sum
        df[col] = df[col].resample('M').sum()
        # rename the column
        df.rename(columns={col: col + " MONTHLY"}, inplace=True)

对于字符串：

elif "string" in str(col):
    # get all the unique jobs run during the month
    df[col] = df[col].groupby(pd.Grouper(freq='M')).unique()
    df.rename(columns={col: col + " MONTHLY"}, inplace=True)

这些导致每月数据被插入到数据框中，但每天仍然存在，而且很难找到，也不是我需要的。

一些示例数据：

        float_1 float_2 string_1    string_2
12/30/2019  1   2   a   a
12/31/2019  1   3   a   b
1/1/2020    2   4   a   c
1/2/2020    3   5   b   d

预期输出为：

12/2019 2   5   a    a, b
1/2020  5   9   a, b c, d

不确定这是否重要，但真实数据在整个数据的随机日期确实有 NaN。

Answer 1

尝试分别聚合数字列和非数字列，然后将它们连接回去：

df.index = pd.to_datetime(df.index)

numerics = df.select_dtypes('number').resample('M').sum()
strings = df.select_dtypes('object').resample('M').agg(lambda x: ','.join(set(x)))

numerics.join(strings)
#            float_1  float_2 string_1 string_2
#2019-12-31        2        5        a      a,b
#2020-01-31        5        9      a,b      d,c

Python - Pandas 使用字符串和浮点数对数据帧重新采样

Python - Pandas resample dataframe with strings and floats

pandas

datetimeindex