Pandas 系列重采样

Question

我有以下 pandas 系列：

    dummy_array = pd.Series(np.array(range(-10, 11)), index=(np.array(range(0, 21))/10))

这会产生以下数组：

如果我想重新取样，我该怎么做？我阅读了文档并建议这样做：

    dummy_array.resample('20S').mean()

但它不起作用。有什么想法吗？

谢谢。

编辑：

我希望我的最终矢量具有双倍的频率。所以像这样：

0.0   -10
0.05   -9.5
0.1    -9
0.15    -8.5
0.2    -8
0.25    -7.5
etc.

Answer 1

根据原始数组创建差异列表。然后我们将其分解为值和索引以创建 'pd.Series'。加入新的 pd.series 并重新排序。

# new list
ups = [[x+0.05,y+0.5] for x,y in zip(dummy_array.index, dummy_array)]
idx = [i[0] for i in ups]
val = [i[1] for i in ups]
d2 = pd.Series(val, index=idx)
d3 = pd.concat([dummy_array,d2], axis=0)
d3.sort_values(inplace=True)

d3
0.00   -10.0
0.05    -9.5
0.10    -9.0
0.15    -8.5
0.20    -8.0
0.25    -7.5
0.30    -7.0
0.35    -6.5
0.40    -6.0
0.45    -5.5
0.50    -5.0
0.55    -4.5
0.60    -4.0
0.65    -3.5
0.70    -3.0
0.75    -2.5
0.80    -2.0
0.85    -1.5
0.90    -1.0
0.95    -0.5
1.00     0.0
1.05     0.5
1.10     1.0
1.15     1.5
1.20     2.0
1.25     2.5
1.30     3.0
1.35     3.5
1.40     4.0
1.45     4.5
1.50     5.0
1.55     5.5
1.60     6.0
1.65     6.5
1.70     7.0
1.75     7.5
1.80     8.0
1.85     8.5
1.90     9.0
1.95     9.5
2.00    10.0
2.05    10.5
dtype: float64

Answer 2

这是使用 np.linspace()、.reindex() 和 interpolate 的解决方案：

数据框 dummmy_array 如上所述创建。

# get properties of original index
start = dummy_array.index.min()
end = dummy_array.index.max()
num_gridpoints_orig = dummy_array.index.size

# calc number of grid-points in new index
num_gridpoints_new = (num_gridpoints_orig  * 2) - 1 

# create new index, with twice the number of grid-points (i.e., smaller step-size)
idx_new = np.linspace(start, end, num_gridpoints_new)

# re-index the data frame.  New grid-points have value of NaN,
# and we replace these NaNs with interpolated values
df2 = dummy_array.reindex(index=idx_new).interpolate()

print(df2.head())

0.00   -10.0
0.05    -9.5
0.10    -9.0
0.15    -8.5
0.20    -8.0

Answer 3

感谢大家的贡献。在查看答案并多加思考后，我发现了一个更通用的解决方案，可以处理所有可能的情况。在这种情况下，我想将 dummy_arrayA 上采样到与 dummy_arrayB 相同的索引。我所做的是创建一个同时具有 A 和 B 的新索引。然后我使用 reindex 和 interpolate 函数来计算新值，最后我删除旧索引以便获得相同的数组大小作为 dummy_array-B.

import pandas as pd
import numpy as np

# Create Dummy arrays
dummy_arrayA = pd.Series(np.array(range(0, 4)), index=[0,0.5,1.0,1.5])
dummy_arrayB = pd.Series(np.array(range(0, 5)), index=[0,0.4,0.8,1.2,1.6])

# Create new index based on array A
new_ind = pd.Index(dummy_arrayA.index)
# merge index A and B
new_ind=new_ind.union(dummy_arrayB.index)

# Use the reindex function. This will copy all the values and add the missing ones with nan. Then we call the interpolate function with the index method. So that it's interpolates based on the time.
df2 = dummy_arrayA.reindex(index=new_ind).interpolate(method="index")

# Delete the points.
New_ind_inter = dummy_arrayA.index.intersection(new_ind)
# We need to prevent that common point are also deleted.
new_ind = new_ind.difference(New_ind_inter)

# Delete the old points. So that the final array matchs dummy_arrayB
df2 = df2.drop(new_ind)

print(df2)

Pandas 系列重采样

Pandas Series Resample

python

pandas

pandas-resample