显示自某物翻倍以来已经过了多长时间

Question

我读到一个有趣的统计数据，自去年以来，股市在最短的休息时间内上涨了 100%（即翻了一番）——我期待 test/replicate 这一说法。

以下数据来自 FRED（美联储数据存放处），是 WILL5000 指数的数据，该指数可追溯到 1970 年，而标准普尔指数仅到 2011 年。

| DATE                |   WILL5000 |    50%   |
| 1970-12-31 00:00:00 |       1    |    0.5   |
| 1971-01-01 00:00:00 |     nan    |    nan   |
| 1971-01-04 00:00:00 |     nan    |    nan   |
| 1971-01-05 00:00:00 |     nan    |    nan   |
| 1971-01-06 00:00:00 |     nan    |    nan   |
|         ...         |     ...    |    ...   |
| 2021-07-21 00:00:00 |   216.54   |  108.27  |
| 2021-07-22 00:00:00 |   216.68   |  108.34  |
| 2021-07-23 00:00:00 |   218.84   |  109.42  |
| 2021-07-26 00:00:00 |   219.32   |  109.66  |
| 2021-07-27 00:00:00 |   218.07   |  109.035 |

我想到的一种方法是添加一个列，其中包含 WILL5000 索引值的一半，然后使用代码搜索低于该水平的值（这将是 100% 移动），并记录它有多少天从那以后。

我似乎无法在任何地方找到如何做到这一点 - 并且很想听听任何其他实现它的方法。

Answer 1

这个问题在你的系列中有 O(n2) 个步骤 n 个点。

对于序列中的 ith 点，您需要检查 wj >= 2wi 所有 j > i。在第一个 j （如果有的话）满足每种情况下的要求。换句话说，将一个日期固定为基线，然后在所有未来日期中寻找翻倍的条件；对所有可能的基准日期执行此操作。

在 Pandas 中，这意味着您必须 (i) 将数据帧与其自身交叉合并并将其过滤到“上三角”（即 j > i）部分，（ ii) 在 i.

上找到每组第一次加倍的时间

这是完成工作的 Python+Pandas 代码：

import numpy as np
import pandas as pd

# load your data --> construct synthetic df for this example
np.random.seed(52)
date_axis = pd.date_range('1970-01-01', '2021-01-01', freq='M')
n = len(date_axis)
raw_df = pd.DataFrame(data={'date': date_axis, 'ticker_value': 300.0 * np.random.rand(n)})

# create n^2 df
df = pd.merge(raw_df, raw_df, how='cross').sort_values(by=['date_x', 'date_y'])

# restrict to upper triangle
df = df.loc[df.date_y > df.date_x, :]

# add a column to check if doubling condition is met
df['is_at_least_double'] = (df.ticker_value_y >= 2.0 * df.ticker_value_x)

# throw away values that don't meet the condition
df = df.loc[df.is_at_least_double, :].drop(columns=['is_at_least_double'])

# pick up the first value that satisfies the condition -- this is why we did the sort
df = df.groupby('date_x').first().reset_index()

# find intervals
df['interval'] = df.date_y - df.date_x

# find the smallest interval; tie-breaker is the one with the earliest base date
df.sort_values(by=['interval', 'date_x'], inplace=True)
solution = df.iloc[0]

print(solution)

注释解释了代码中的步骤。我建议运行在控制台中逐行检查它并检查中间结果以了解发生了什么。

显示自某物翻倍以来已经过了多长时间

Display how long it has been since something doubled

python

time-series

dataframe

data-science