查找 R 中多天变量变化的次数
Finding how many times variable changes over multiple days in R
我在执行此任务时遇到了问题。基本上我有价格数据以及购买商品的时间和日期。有了这个,我想找出一件商品的价格在一天内变化了多少次。例如,早上的价格是 7000,但晚上的价格可能是 4000。所以我想在几天内为多个项目执行此操作。还有一个与购买相关联的订单 ID,但它不必是唯一的。
我发现这篇文章真的很有帮助,我取得了一些进步,但无法得到我需要的东西
我已经为数据放置了 dput 输出,以便可以重新创建它。结果应该是这样的
Item Price_changed_over_all_days Price_changed_in_one_day
x 10 3
y 4 1
z 5 2
感谢任何advice/help!如果我能让问题更清楚,请告诉我。
PS:如果可能的话,我也希望能够说出任何给定日期的最高价和最低价,如果这也能算出来那就太好了。我知道如何在特定的一天执行此操作,但是
structure(list(item = c("x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z"),
bought_date = structure(c(1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), bought_time = structure(c(-2209016101,
-2209014165, -2209006172, -2208996246, -2208992947, -2208991967,
-2209070025, -2209069890, -2209064616, -2209055193, -2209054850,
-2209053617, -2209050638, -2209050426, -2209048499, -2209047983,
-2209047872, -2209047390, -2209046473, -2209045120, -2209044562,
-2209044418, -2209042104, -2209041480, -2209040748, -2209037870,
-2209037696, -2209037309, -2209035846, -2209034872, -2209034429,
-2209034323, -2209030237, -2209028615, -2209028570, -2209028477,
-2209026900, -2209026787, -2209025234, -2209024468, -2209023183,
-2209021020, -2209020175, -2209019934, -2209019733, -2209018417,
-2209016646, -2209016540, -2209015208, -2209014941, -2209011636,
-2209011444, -2209010896, -2209010639, -2209009483, -2209009412,
-2209008912, -2209007424, -2209006197, -2209005462, -2209005439,
-2209005414, -2209004221, -2208998803, -2208998727, -2208993252,
-2208993224, -2208993194, -2208992478, -2208992218, -2209019432,
-2209018785, -2209017271, -2209017188, -2209017177, -2209014531,
-2209014484, -2209014247, -2209013964, -2209012511, -2209009805,
-2209009633, -2209009617, -2209009556, -2209009533, -2209009499,
-2209009474, -2209008099, -2209007958, -2209000389, -2209068522,
-2209062412, -2209062053, -2209058480, -2209058472, -2209058161,
-2209057878, -2209057740, -2209056037, -2209055339, -2209055045,
-2209054472, -2209051624, -2209050659, -2209050339, -2209047529,
-2209045264, -2209038811, -2209038586, -2209038487, -2209038004,
-2209036906, -2209036606, -2209034142, -2209034049, -2209033773,
-2209030890, -2209030794, -2209030626, -2209029600, -2209029464,
-2209027707, -2209026486, -2209024697, -2209021552, -2209021379,
-2209019844, -2209019716, -2209018482, -2209018436, -2209018365,
-2209017376, -2209017340, -2209017319, -2209017054, -2209016900,
-2209016126, -2209014622, -2209013286, -2209012584, -2209009905,
-2209009208, -2209006827, -2209006663, -2208990872, -2209020164,
-2209015899, -2209013965, -2209013933, -2209011963, -2209010443,
-2209010351, -2209008868, -2209007569, -2209063141, -2209063059,
-2209062882, -2209062852, -2209054720, -2209054349, -2209050324,
-2209049810, -2209047902, -2209041612, -2209039205, -2209038444,
-2209038393, -2209038219, -2209037598, -2209037562, -2209037497,
-2209037082, -2209036943, -2209036795, -2209036404, -2209034846,
-2209032324, -2209032289, -2209031999, -2209031958, -2209030309,
-2209029952, -2209023411, -2209022296, -2209021086, -2209020624,
-2209020221, -2209019575, -2209017996, -2209017794, -2209014135,
-2209011509, -2209009303, -2209007905, -2209007799, -2209007709,
-2209005139, -2209004957, -2208998695, -2208998233, -2208990008,
-2208989978), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
ID = c(540273, 540333, 540568, 540734, 540766, 540771, 540808,
540810, 540847, 541011, 541022, 541060, 541147, 541160, 541231,
541252, 541259, 541283, 541317, 541379, 541396, 541399, 541503,
541537, 541562, 541682, 541684, 541704, 541779, 541849, 541879,
541883, 542039, 542115, 542117, 542120, 542164, 542166, 542207,
542236, 542275, 542358, 542394, 542403, 542414, 542457, 542515,
542522, 542579, 542598, 542741, 542749, 542772, 542786, 542825,
542831, 542854, 542934, 542975, 543003, 543004, 543005, 543044,
543109, 543111, 543156, 543158, 543159, 543162, 543164, 540161,
540187, 540230, 540231, 540233, 540322, 540324, 540329, 540344,
540384, 540468, 540477, 540480, 540482, 540483, 540485, 540486,
540522, 540526, 540683, 540820, 540876, 540880, 540917, 540918,
540927, 540934, 540935, 540989, 541005, 541014, 541034, 541114,
541146, 541163, 541276, 541371, 541646, 541653, 541658, 541678,
541725, 541738, 541892, 541895, 541916, 542015, 542021, 542028,
542080, 542084, 542143, 542175, 542225, 542333, 542337, 542409,
542415, 542455, 542456, 542460, 542482, 542485, 542487, 542500,
542505, 542544, 542610, 542677, 542704, 542814, 542837, 542950,
542955, 543174, 540141, 540281, 540343, 540348, 540401, 540453,
540457, 540500, 540535, 540865, 540866, 540869, 540871, 541027,
541038, 541165, 541187, 541257, 541533, 541627, 541661, 541662,
541668, 541691, 541693, 541695, 541713, 541723, 541731, 541751,
541850, 541960, 541963, 541978, 541981, 542035, 542053, 542269,
542301, 542355, 542375, 542390, 542424, 542466, 542471, 542642,
542745, 542835, 542911, 542917, 542920, 543019, 543031, 543112,
543117, 543178, 543179), price = c(7190, 9200, 7170, 7170,
7170, 9170, 7170, 7170, 7170, 9170, 7170, 9170, 8330, 7170,
9170, 7170, 9170, 7170, 7170, 7170, 7170, 7170, 7170, 7170,
7170, 7170, 7170, 9170, 9170, 9170, 9170, 9170, 8330, 7170,
7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 9170,
7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170,
7170, 7170, 9170, 7170, 7160, 7160, 7160, 7160, 7160, 7160,
7160, 7160, 8330, 8330, 8330, 8330, 7190, 7190, 7190, 7190,
7190, 7190, 7190, 7190, 9200, 9200, 7190, 7190, 7190, 7190,
7190, 7190, 7190, 9200, 9200, 9170, 9170, 7170, 7170, 9170,
7170, 7170, 9170, 9170, 8330, 9170, 8330, 8330, 9170, 7170,
9170, 7170, 7170, 9170, 9170, 9170, 7170, 7170, 7170, 7170,
7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 7170, 9170,
9170, 9170, 7170, 7170, 7170, 9170, 9170, 7170, 7170, 7170,
7170, 7170, 7170, 8330, 7170, 9170, 7170, 9170, 7170, 7170,
9160, 7190, 9200, 7190, 7190, 14880, 9200, 9200, 9200, 7190,
7170, 7170, 7170, 7170, 7170, 7170, 8330, 7170, 9170, 7170,
9170, 9170, 9170, 9170, 7170, 7170, 7170, 9170, 7170, 7170,
7170, 7170, 7170, 7170, 9170, 9170, 9170, 7170, 7170, 7170,
9170, 9170, 9170, 9170, 7170, 7170, 7170, 8330, 7170, 7170,
7170, 7170, 7160, 7160, 9160, 7160, 9160, 9160)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -202L))
首先,我假设您的数据在变量 df
中。
您必须仔细考虑要计算的内容。
如果你想要不分天数的价格变化,你可以这样做:
df %>% group_by(item) %>%
summarise(Price_changed_over_all_days =
sum((lead(price) - price)!=0, na.rm = TRUE))
# A tibble: 3 x 2
# item Price_changed_over_all_days
# <chr> <int>
#1 x 24
#2 y 30
#3 z 24
然而,如果你想计算特定日期的价格变化次数,你会得到这样的结果:
df %>% group_by(item, bought_date) %>%
summarise(Price_changed_in_one_day =
sum((lead(price) - price)!=0, na.rm = TRUE))
# A tibble: 6 x 3
# Groups: item [3]
# item bought_date Price_changed_in_one_day
# <chr> <dttm> <int>
#1 x 2020-09-21 00:00:00 3
#2 x 2020-09-22 00:00:00 20
#3 y 2020-09-21 00:00:00 4
#4 y 2020-09-22 00:00:00 26
#5 z 2020-09-21 00:00:00 5
#6 z 2020-09-22 00:00:00 18
只是在这种情况下,您在摘要中有更多的行 table。
如果你只想要一个 table,你必须以某种方式 assemble 它并根据当天的值决定一些统计数据。也许平均值在这里合适?我不知道。
df %>% group_by(item) %>%
summarise(Price_changed_over_all_days =
sum((lead(price) - price)!=0, na.rm = TRUE)) %>%
left_join(
df %>% group_by(item, bought_date) %>%
summarise(Price_changed_in_one_day =
sum((lead(price) - price)!=0, na.rm = TRUE)) %>%
group_by(item) %>%
summarise(Price_changed_in_one_day =
mean(Price_changed_in_one_day)
), by= "item")
# A tibble: 3 x 3
# item Price_changed_over_all_days Price_changed_in_one_day
# <chr> <int> <dbl>
#1 x 24 11.5
#2 y 30 15
#3 z 24 11.5
另请注意,价格变化可能会在一天之内发生,因此给定产品几天内的变化总和不一定等于该产品所有价格变化的总和。在您的情况下,产品“x”就是这种情况。
对于 data.table
你可以使用 rleid
:
library(data.table)
setDT(data)
data[,.(times=max(rleid(price))-1),by=.(item)]
# item times
#1: x 24
#2: y 30
#3: z 24
data[,.(timesday=max(rleid(price))-1),by=.(item,bought_date)]
# item bought_date timesday
#1: x 2020-09-21 3
#2: x 2020-09-22 20
#3: y 2020-09-21 4
#4: y 2020-09-22 26
#5: z 2020-09-21 5
#6: z 2020-09-22 18
我在执行此任务时遇到了问题。基本上我有价格数据以及购买商品的时间和日期。有了这个,我想找出一件商品的价格在一天内变化了多少次。例如,早上的价格是 7000,但晚上的价格可能是 4000。所以我想在几天内为多个项目执行此操作。还有一个与购买相关联的订单 ID,但它不必是唯一的。
我发现这篇文章真的很有帮助,我取得了一些进步,但无法得到我需要的东西
我已经为数据放置了 dput 输出,以便可以重新创建它。结果应该是这样的
Item Price_changed_over_all_days Price_changed_in_one_day
x 10 3
y 4 1
z 5 2
感谢任何advice/help!如果我能让问题更清楚,请告诉我。
PS:如果可能的话,我也希望能够说出任何给定日期的最高价和最低价,如果这也能算出来那就太好了。我知道如何在特定的一天执行此操作,但是
structure(list(item = c("x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z"),
bought_date = structure(c(1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), bought_time = structure(c(-2209016101,
-2209014165, -2209006172, -2208996246, -2208992947, -2208991967,
-2209070025, -2209069890, -2209064616, -2209055193, -2209054850,
-2209053617, -2209050638, -2209050426, -2209048499, -2209047983,
-2209047872, -2209047390, -2209046473, -2209045120, -2209044562,
-2209044418, -2209042104, -2209041480, -2209040748, -2209037870,
-2209037696, -2209037309, -2209035846, -2209034872, -2209034429,
-2209034323, -2209030237, -2209028615, -2209028570, -2209028477,
-2209026900, -2209026787, -2209025234, -2209024468, -2209023183,
-2209021020, -2209020175, -2209019934, -2209019733, -2209018417,
-2209016646, -2209016540, -2209015208, -2209014941, -2209011636,
-2209011444, -2209010896, -2209010639, -2209009483, -2209009412,
-2209008912, -2209007424, -2209006197, -2209005462, -2209005439,
-2209005414, -2209004221, -2208998803, -2208998727, -2208993252,
-2208993224, -2208993194, -2208992478, -2208992218, -2209019432,
-2209018785, -2209017271, -2209017188, -2209017177, -2209014531,
-2209014484, -2209014247, -2209013964, -2209012511, -2209009805,
-2209009633, -2209009617, -2209009556, -2209009533, -2209009499,
-2209009474, -2209008099, -2209007958, -2209000389, -2209068522,
-2209062412, -2209062053, -2209058480, -2209058472, -2209058161,
-2209057878, -2209057740, -2209056037, -2209055339, -2209055045,
-2209054472, -2209051624, -2209050659, -2209050339, -2209047529,
-2209045264, -2209038811, -2209038586, -2209038487, -2209038004,
-2209036906, -2209036606, -2209034142, -2209034049, -2209033773,
-2209030890, -2209030794, -2209030626, -2209029600, -2209029464,
-2209027707, -2209026486, -2209024697, -2209021552, -2209021379,
-2209019844, -2209019716, -2209018482, -2209018436, -2209018365,
-2209017376, -2209017340, -2209017319, -2209017054, -2209016900,
-2209016126, -2209014622, -2209013286, -2209012584, -2209009905,
-2209009208, -2209006827, -2209006663, -2208990872, -2209020164,
-2209015899, -2209013965, -2209013933, -2209011963, -2209010443,
-2209010351, -2209008868, -2209007569, -2209063141, -2209063059,
-2209062882, -2209062852, -2209054720, -2209054349, -2209050324,
-2209049810, -2209047902, -2209041612, -2209039205, -2209038444,
-2209038393, -2209038219, -2209037598, -2209037562, -2209037497,
-2209037082, -2209036943, -2209036795, -2209036404, -2209034846,
-2209032324, -2209032289, -2209031999, -2209031958, -2209030309,
-2209029952, -2209023411, -2209022296, -2209021086, -2209020624,
-2209020221, -2209019575, -2209017996, -2209017794, -2209014135,
-2209011509, -2209009303, -2209007905, -2209007799, -2209007709,
-2209005139, -2209004957, -2208998695, -2208998233, -2208990008,
-2208989978), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
ID = c(540273, 540333, 540568, 540734, 540766, 540771, 540808,
540810, 540847, 541011, 541022, 541060, 541147, 541160, 541231,
541252, 541259, 541283, 541317, 541379, 541396, 541399, 541503,
541537, 541562, 541682, 541684, 541704, 541779, 541849, 541879,
541883, 542039, 542115, 542117, 542120, 542164, 542166, 542207,
542236, 542275, 542358, 542394, 542403, 542414, 542457, 542515,
542522, 542579, 542598, 542741, 542749, 542772, 542786, 542825,
542831, 542854, 542934, 542975, 543003, 543004, 543005, 543044,
543109, 543111, 543156, 543158, 543159, 543162, 543164, 540161,
540187, 540230, 540231, 540233, 540322, 540324, 540329, 540344,
540384, 540468, 540477, 540480, 540482, 540483, 540485, 540486,
540522, 540526, 540683, 540820, 540876, 540880, 540917, 540918,
540927, 540934, 540935, 540989, 541005, 541014, 541034, 541114,
541146, 541163, 541276, 541371, 541646, 541653, 541658, 541678,
541725, 541738, 541892, 541895, 541916, 542015, 542021, 542028,
542080, 542084, 542143, 542175, 542225, 542333, 542337, 542409,
542415, 542455, 542456, 542460, 542482, 542485, 542487, 542500,
542505, 542544, 542610, 542677, 542704, 542814, 542837, 542950,
542955, 543174, 540141, 540281, 540343, 540348, 540401, 540453,
540457, 540500, 540535, 540865, 540866, 540869, 540871, 541027,
541038, 541165, 541187, 541257, 541533, 541627, 541661, 541662,
541668, 541691, 541693, 541695, 541713, 541723, 541731, 541751,
541850, 541960, 541963, 541978, 541981, 542035, 542053, 542269,
542301, 542355, 542375, 542390, 542424, 542466, 542471, 542642,
542745, 542835, 542911, 542917, 542920, 543019, 543031, 543112,
543117, 543178, 543179), price = c(7190, 9200, 7170, 7170,
7170, 9170, 7170, 7170, 7170, 9170, 7170, 9170, 8330, 7170,
9170, 7170, 9170, 7170, 7170, 7170, 7170, 7170, 7170, 7170,
7170, 7170, 7170, 9170, 9170, 9170, 9170, 9170, 8330, 7170,
7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 9170,
7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170,
7170, 7170, 9170, 7170, 7160, 7160, 7160, 7160, 7160, 7160,
7160, 7160, 8330, 8330, 8330, 8330, 7190, 7190, 7190, 7190,
7190, 7190, 7190, 7190, 9200, 9200, 7190, 7190, 7190, 7190,
7190, 7190, 7190, 9200, 9200, 9170, 9170, 7170, 7170, 9170,
7170, 7170, 9170, 9170, 8330, 9170, 8330, 8330, 9170, 7170,
9170, 7170, 7170, 9170, 9170, 9170, 7170, 7170, 7170, 7170,
7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 7170, 9170,
9170, 9170, 7170, 7170, 7170, 9170, 9170, 7170, 7170, 7170,
7170, 7170, 7170, 8330, 7170, 9170, 7170, 9170, 7170, 7170,
9160, 7190, 9200, 7190, 7190, 14880, 9200, 9200, 9200, 7190,
7170, 7170, 7170, 7170, 7170, 7170, 8330, 7170, 9170, 7170,
9170, 9170, 9170, 9170, 7170, 7170, 7170, 9170, 7170, 7170,
7170, 7170, 7170, 7170, 9170, 9170, 9170, 7170, 7170, 7170,
9170, 9170, 9170, 9170, 7170, 7170, 7170, 8330, 7170, 7170,
7170, 7170, 7160, 7160, 9160, 7160, 9160, 9160)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -202L))
首先,我假设您的数据在变量 df
中。
您必须仔细考虑要计算的内容。
如果你想要不分天数的价格变化,你可以这样做:
df %>% group_by(item) %>%
summarise(Price_changed_over_all_days =
sum((lead(price) - price)!=0, na.rm = TRUE))
# A tibble: 3 x 2
# item Price_changed_over_all_days
# <chr> <int>
#1 x 24
#2 y 30
#3 z 24
然而,如果你想计算特定日期的价格变化次数,你会得到这样的结果:
df %>% group_by(item, bought_date) %>%
summarise(Price_changed_in_one_day =
sum((lead(price) - price)!=0, na.rm = TRUE))
# A tibble: 6 x 3
# Groups: item [3]
# item bought_date Price_changed_in_one_day
# <chr> <dttm> <int>
#1 x 2020-09-21 00:00:00 3
#2 x 2020-09-22 00:00:00 20
#3 y 2020-09-21 00:00:00 4
#4 y 2020-09-22 00:00:00 26
#5 z 2020-09-21 00:00:00 5
#6 z 2020-09-22 00:00:00 18
只是在这种情况下,您在摘要中有更多的行 table。 如果你只想要一个 table,你必须以某种方式 assemble 它并根据当天的值决定一些统计数据。也许平均值在这里合适?我不知道。
df %>% group_by(item) %>%
summarise(Price_changed_over_all_days =
sum((lead(price) - price)!=0, na.rm = TRUE)) %>%
left_join(
df %>% group_by(item, bought_date) %>%
summarise(Price_changed_in_one_day =
sum((lead(price) - price)!=0, na.rm = TRUE)) %>%
group_by(item) %>%
summarise(Price_changed_in_one_day =
mean(Price_changed_in_one_day)
), by= "item")
# A tibble: 3 x 3
# item Price_changed_over_all_days Price_changed_in_one_day
# <chr> <int> <dbl>
#1 x 24 11.5
#2 y 30 15
#3 z 24 11.5
另请注意,价格变化可能会在一天之内发生,因此给定产品几天内的变化总和不一定等于该产品所有价格变化的总和。在您的情况下,产品“x”就是这种情况。
对于 data.table
你可以使用 rleid
:
library(data.table)
setDT(data)
data[,.(times=max(rleid(price))-1),by=.(item)]
# item times
#1: x 24
#2: y 30
#3: z 24
data[,.(timesday=max(rleid(price))-1),by=.(item,bought_date)]
# item bought_date timesday
#1: x 2020-09-21 3
#2: x 2020-09-22 20
#3: y 2020-09-21 4
#4: y 2020-09-22 26
#5: z 2020-09-21 5
#6: z 2020-09-22 18