如何确定值何时减少或增加时间序列中的某个阈值?
How to determine when a value decrease or increase a certain threshold in a timeseries?
使用以下数据:
library(ggplot2)
df <- structure(list(date = structure(c(
17171, 17172, 17173, 17174,
17175, 17176, 17177, 17178, 17179, 17180, 17181, 17182, 17183,
17184, 17185, 17186, 17187, 17188, 17189, 17190, 17191, 17192,
17193, 17194, 17195, 17196, 17197, 17198, 17199, 17200, 17201,
17202, 17203, 17204, 17205, 17206, 17207, 17208, 17209, 17210,
17211, 17212, 17213, 17214, 17215, 17216, 17217, 17218, 17219,
17220, 17221, 17222, 17223, 17224, 17225, 17226, 17227, 17228,
17229, 17230, 17231, 17232, 17233, 17234, 17235, 17236, 17237,
17238, 17239, 17240, 17241, 17242, 17243, 17244, 17245, 17246,
17247, 17248, 17249, 17250, 17251, 17252, 17253, 17254, 17255,
17256, 17257, 17258, 17259, 17260, 17261, 17262, 17263, 17264,
17265, 17266, 17267, 17268, 17269, 17270, 17271, 17272, 17273,
17274, 17275, 17276, 17277, 17278, 17279, 17280, 17281, 17282,
17283, 17284, 17285, 17286, 17287, 17288, 17289, 17290, 17291,
17292, 17293, 17294, 17295, 17296, 17297, 17298, 17299, 17300,
17301, 17302, 17303, 17304, 17305, 17306, 17307, 17308, 17309,
17310, 17311, 17312, 17313, 17314, 17315, 17316, 17317, 17318,
17319, 17320, 17321, 17322, 17323, 17324, 17325, 17326, 17327,
17328, 17329, 17330, 17331, 17332, 17333, 17334, 17335, 17336,
17337, 17338, 17339, 17340, 17341, 17342, 17343, 17344, 17345,
17346, 17347, 17348, 17349, 17350, 17351, 17352, 17353, 17354,
17355, 17356, 17357, 17358, 17359, 17360, 17361, 17362, 17363,
17364, 17365, 17366, 17367, 17368, 17369, 17370, 17371, 17372,
17373, 17374, 17375, 17376, 17377, 17378, 17379, 17380, 17381,
17382, 17383, 17384, 17385, 17386, 17387, 17388, 17389, 17390,
17391, 17392, 17393, 17394, 17395, 17396, 17397, 17398, 17399,
17400, 17401, 17402, 17403, 17404, 17405, 17406, 17407, 17408,
17409, 17410, 17411, 17412, 17413, 17414, 17415, 17416, 17417,
17418, 17419, 17420, 17421, 17422, 17423, 17424, 17425, 17426,
17427, 17428, 17429, 17430, 17431, 17432, 17433, 17434, 17435,
17436, 17437, 17438, 17439, 17440, 17441, 17442, 17443, 17444,
17445, 17446, 17447, 17448, 17449, 17450, 17451, 17452, 17453,
17454, 17455, 17456, 17457, 17458, 17459, 17460, 17461, 17462,
17463, 17464, 17465, 17466, 17467, 17468, 17469, 17470, 17471,
17472, 17473, 17474, 17475, 17476, 17477, 17478, 17479, 17480,
17481, 17482, 17483, 17484, 17485, 17486, 17487, 17488, 17489,
17490, 17491, 17492, 17493, 17494, 17495, 17496, 17497, 17498,
17499, 17500, 17501, 17502, 17503, 17504, 17505, 17506, 17507,
17508, 17509, 17510, 17511, 17512, 17513, 17514, 17515, 17516,
17517, 17518, 17519, 17520, 17521, 17522, 17523, 17524, 17525,
17526
), class = "Date"), y = c(
75.5, 75.5, 75.5, 70.5, 70.5,
61.5, 61.5, 61.5, 61.5, 68.5, 71.5, 71.5, 71.5, 81.5, 71.5, 71.5,
55.5, 36.5, 28, 28, 24.5, 24, 20.5, 16.5, 16.5, 20, 27.5, 26.5,
22, 22, 28, 33, 43, 52.5, 56, 61.5, 70.5, 80, 84, 88, 88, 88,
88, 88, 62, 49, 42.5, 42.5, 42.5, 35.5, 43.5, 50.5, 58.5, 74.5,
84.5, 89.5, 97, 100, 100, 100, 100, 100, 100, 100, 100, 99.5,
99.5, 99.5, 98.5, 97.5, 96, 94.5, 91.5, 82, 70.5, 54, 54, 54,
66.5, 70, 55.5, 43, 55.5, 55.5, 57.5, 59.5, 50, 50, 50, 50, 57,
63, 57, 57, 62.5, 62.5, 62.5, 55, 45.5, 45.5, 45.5, 45.5, 55.5,
70.5, 72.5, 75.5, 79.5, 79.5, 79.5, 75.5, 70, 37.5, 6, 2.5, 0.5,
0.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 2, 4, 4, 10.5, 18, 19, 20.5, 27, 40, 55, 55, 55, 55, 40,
51, 44, 51, 44, 44, 44, 45.5, 52.5, 57, 57, 62.5, 68, 72, 76,
76, 79, 75, 71, 70.5, 70.5, 71, 74, 70.5, 68.5, 68.5, 65, 62,
60, 57.5, 56, 56, 51.5, 49.5, 49.5, 49.5, 49.5, 44.5
)), row.names = c(
NA,
-356L
), class = c("data.frame"))
head(df)
#> date y
#> 1 2017-01-05 75.5
#> 2 2017-01-06 75.5
#> 3 2017-01-07 75.5
#> 4 2017-01-08 70.5
#> 5 2017-01-09 70.5
#> 6 2017-01-10 61.5
ggplot(df, aes(x = date, y = y)) +
geom_path() +
geom_point() +
geom_hline(yintercept = 30, color = "red", lty = 2) +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
我想找到两种日期:
y
下降到 30(红色虚线)以下至少 10 天的日期(在上图中大约 1 月 3/4 日和 5 月之前)。
y
重新增加到 30 以上至少 10 天的日期(从上图中的 2 月初到 11 月之后不久)。
我想使用 data.table
来提高速度效率。也许 rle()
?
由 reprex package (v2.0.0)
于 2021-07-16 创建
您指定的每个条件实际上有两个序列。
基于 R 的可能解决方案 rle
。
1)创建一个运行长度的对象和一个自定义函数:
# create a run length object of values above 30
rl <- rle(df$y > 30)
# custum function
f <- function(w) sum(rl$lengths[1:(w-1)]) + 1
2) 现在你可以确定值小于30的序列的起始日期如下:
# start date of first sequence below 30
w1 <- which(!rl$values & rl$lengths > 10)
df$date[sapply(w1, f)]
给出:
[1] "2017-01-23" "2017-04-27"
3) 以及值大于 30 的序列的开始日期,额外条件是需要在值小于 30 的序列之后:
# start date of sequences above 30
# and after sequences below 30
w2 <- which(rl$values & rl$lengths > 10)
w2 <- w2[w2 > w1[1]]
df$date[sapply(w2, f)]
给出:
[1] "2017-02-05" "2017-11-14"
使用以下数据:
library(ggplot2)
df <- structure(list(date = structure(c(
17171, 17172, 17173, 17174,
17175, 17176, 17177, 17178, 17179, 17180, 17181, 17182, 17183,
17184, 17185, 17186, 17187, 17188, 17189, 17190, 17191, 17192,
17193, 17194, 17195, 17196, 17197, 17198, 17199, 17200, 17201,
17202, 17203, 17204, 17205, 17206, 17207, 17208, 17209, 17210,
17211, 17212, 17213, 17214, 17215, 17216, 17217, 17218, 17219,
17220, 17221, 17222, 17223, 17224, 17225, 17226, 17227, 17228,
17229, 17230, 17231, 17232, 17233, 17234, 17235, 17236, 17237,
17238, 17239, 17240, 17241, 17242, 17243, 17244, 17245, 17246,
17247, 17248, 17249, 17250, 17251, 17252, 17253, 17254, 17255,
17256, 17257, 17258, 17259, 17260, 17261, 17262, 17263, 17264,
17265, 17266, 17267, 17268, 17269, 17270, 17271, 17272, 17273,
17274, 17275, 17276, 17277, 17278, 17279, 17280, 17281, 17282,
17283, 17284, 17285, 17286, 17287, 17288, 17289, 17290, 17291,
17292, 17293, 17294, 17295, 17296, 17297, 17298, 17299, 17300,
17301, 17302, 17303, 17304, 17305, 17306, 17307, 17308, 17309,
17310, 17311, 17312, 17313, 17314, 17315, 17316, 17317, 17318,
17319, 17320, 17321, 17322, 17323, 17324, 17325, 17326, 17327,
17328, 17329, 17330, 17331, 17332, 17333, 17334, 17335, 17336,
17337, 17338, 17339, 17340, 17341, 17342, 17343, 17344, 17345,
17346, 17347, 17348, 17349, 17350, 17351, 17352, 17353, 17354,
17355, 17356, 17357, 17358, 17359, 17360, 17361, 17362, 17363,
17364, 17365, 17366, 17367, 17368, 17369, 17370, 17371, 17372,
17373, 17374, 17375, 17376, 17377, 17378, 17379, 17380, 17381,
17382, 17383, 17384, 17385, 17386, 17387, 17388, 17389, 17390,
17391, 17392, 17393, 17394, 17395, 17396, 17397, 17398, 17399,
17400, 17401, 17402, 17403, 17404, 17405, 17406, 17407, 17408,
17409, 17410, 17411, 17412, 17413, 17414, 17415, 17416, 17417,
17418, 17419, 17420, 17421, 17422, 17423, 17424, 17425, 17426,
17427, 17428, 17429, 17430, 17431, 17432, 17433, 17434, 17435,
17436, 17437, 17438, 17439, 17440, 17441, 17442, 17443, 17444,
17445, 17446, 17447, 17448, 17449, 17450, 17451, 17452, 17453,
17454, 17455, 17456, 17457, 17458, 17459, 17460, 17461, 17462,
17463, 17464, 17465, 17466, 17467, 17468, 17469, 17470, 17471,
17472, 17473, 17474, 17475, 17476, 17477, 17478, 17479, 17480,
17481, 17482, 17483, 17484, 17485, 17486, 17487, 17488, 17489,
17490, 17491, 17492, 17493, 17494, 17495, 17496, 17497, 17498,
17499, 17500, 17501, 17502, 17503, 17504, 17505, 17506, 17507,
17508, 17509, 17510, 17511, 17512, 17513, 17514, 17515, 17516,
17517, 17518, 17519, 17520, 17521, 17522, 17523, 17524, 17525,
17526
), class = "Date"), y = c(
75.5, 75.5, 75.5, 70.5, 70.5,
61.5, 61.5, 61.5, 61.5, 68.5, 71.5, 71.5, 71.5, 81.5, 71.5, 71.5,
55.5, 36.5, 28, 28, 24.5, 24, 20.5, 16.5, 16.5, 20, 27.5, 26.5,
22, 22, 28, 33, 43, 52.5, 56, 61.5, 70.5, 80, 84, 88, 88, 88,
88, 88, 62, 49, 42.5, 42.5, 42.5, 35.5, 43.5, 50.5, 58.5, 74.5,
84.5, 89.5, 97, 100, 100, 100, 100, 100, 100, 100, 100, 99.5,
99.5, 99.5, 98.5, 97.5, 96, 94.5, 91.5, 82, 70.5, 54, 54, 54,
66.5, 70, 55.5, 43, 55.5, 55.5, 57.5, 59.5, 50, 50, 50, 50, 57,
63, 57, 57, 62.5, 62.5, 62.5, 55, 45.5, 45.5, 45.5, 45.5, 55.5,
70.5, 72.5, 75.5, 79.5, 79.5, 79.5, 75.5, 70, 37.5, 6, 2.5, 0.5,
0.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 2, 4, 4, 10.5, 18, 19, 20.5, 27, 40, 55, 55, 55, 55, 40,
51, 44, 51, 44, 44, 44, 45.5, 52.5, 57, 57, 62.5, 68, 72, 76,
76, 79, 75, 71, 70.5, 70.5, 71, 74, 70.5, 68.5, 68.5, 65, 62,
60, 57.5, 56, 56, 51.5, 49.5, 49.5, 49.5, 49.5, 44.5
)), row.names = c(
NA,
-356L
), class = c("data.frame"))
head(df)
#> date y
#> 1 2017-01-05 75.5
#> 2 2017-01-06 75.5
#> 3 2017-01-07 75.5
#> 4 2017-01-08 70.5
#> 5 2017-01-09 70.5
#> 6 2017-01-10 61.5
ggplot(df, aes(x = date, y = y)) +
geom_path() +
geom_point() +
geom_hline(yintercept = 30, color = "red", lty = 2) +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
我想找到两种日期:
y
下降到 30(红色虚线)以下至少 10 天的日期(在上图中大约 1 月 3/4 日和 5 月之前)。y
重新增加到 30 以上至少 10 天的日期(从上图中的 2 月初到 11 月之后不久)。
我想使用 data.table
来提高速度效率。也许 rle()
?
由 reprex package (v2.0.0)
于 2021-07-16 创建您指定的每个条件实际上有两个序列。
基于 R 的可能解决方案 rle
。
1)创建一个运行长度的对象和一个自定义函数:
# create a run length object of values above 30
rl <- rle(df$y > 30)
# custum function
f <- function(w) sum(rl$lengths[1:(w-1)]) + 1
2) 现在你可以确定值小于30的序列的起始日期如下:
# start date of first sequence below 30
w1 <- which(!rl$values & rl$lengths > 10)
df$date[sapply(w1, f)]
给出:
[1] "2017-01-23" "2017-04-27"
3) 以及值大于 30 的序列的开始日期,额外条件是需要在值小于 30 的序列之后:
# start date of sequences above 30
# and after sequences below 30
w2 <- which(rl$values & rl$lengths > 10)
w2 <- w2[w2 > w1[1]]
df$date[sapply(w2, f)]
给出:
[1] "2017-02-05" "2017-11-14"