查找 R 中多天变量变化的次数

Finding how many times variable changes over multiple days in R

我在执行此任务时遇到了问题。基本上我有价格数据以及购买商品的时间和日期。有了这个,我想找出一件商品的价格在一天内变化了多少次。例如,早上的价格是 7000,但晚上的价格可能是 4000。所以我想在几天内为多个项目执行此操作。还有一个与购买相关联的订单 ID,但它不必是唯一的。

我发现这篇文章真的很有帮助,我取得了一些进步,但无法得到我需要的东西

我已经为数据放置了 dput 输出,以便可以重新创建它。结果应该是这样的

Item  Price_changed_over_all_days Price_changed_in_one_day
x           10                               3
y           4                                1
z           5                                2

感谢任何advice/help!如果我能让问题更清楚,请告诉我。

PS:如果可能的话,我也希望能够说出任何给定日期的最高价和最低价,如果这也能算出来那就太好了。我知道如何在特定的一天执行此操作,但是

structure(list(item = c("x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", 
"y", "y", "y", "y", "y", "y", "y", "z", "z", "z", "z", "z", "z", 
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", 
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", 
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", 
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z"), 
    bought_date = structure(c(1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600646400, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600646400, 1600646400, 1600646400, 1600646400, 
    1600646400, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800, 1600732800, 
    1600732800, 1600732800, 1600732800, 1600732800), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), bought_time = structure(c(-2209016101, 
    -2209014165, -2209006172, -2208996246, -2208992947, -2208991967, 
    -2209070025, -2209069890, -2209064616, -2209055193, -2209054850, 
    -2209053617, -2209050638, -2209050426, -2209048499, -2209047983, 
    -2209047872, -2209047390, -2209046473, -2209045120, -2209044562, 
    -2209044418, -2209042104, -2209041480, -2209040748, -2209037870, 
    -2209037696, -2209037309, -2209035846, -2209034872, -2209034429, 
    -2209034323, -2209030237, -2209028615, -2209028570, -2209028477, 
    -2209026900, -2209026787, -2209025234, -2209024468, -2209023183, 
    -2209021020, -2209020175, -2209019934, -2209019733, -2209018417, 
    -2209016646, -2209016540, -2209015208, -2209014941, -2209011636, 
    -2209011444, -2209010896, -2209010639, -2209009483, -2209009412, 
    -2209008912, -2209007424, -2209006197, -2209005462, -2209005439, 
    -2209005414, -2209004221, -2208998803, -2208998727, -2208993252, 
    -2208993224, -2208993194, -2208992478, -2208992218, -2209019432, 
    -2209018785, -2209017271, -2209017188, -2209017177, -2209014531, 
    -2209014484, -2209014247, -2209013964, -2209012511, -2209009805, 
    -2209009633, -2209009617, -2209009556, -2209009533, -2209009499, 
    -2209009474, -2209008099, -2209007958, -2209000389, -2209068522, 
    -2209062412, -2209062053, -2209058480, -2209058472, -2209058161, 
    -2209057878, -2209057740, -2209056037, -2209055339, -2209055045, 
    -2209054472, -2209051624, -2209050659, -2209050339, -2209047529, 
    -2209045264, -2209038811, -2209038586, -2209038487, -2209038004, 
    -2209036906, -2209036606, -2209034142, -2209034049, -2209033773, 
    -2209030890, -2209030794, -2209030626, -2209029600, -2209029464, 
    -2209027707, -2209026486, -2209024697, -2209021552, -2209021379, 
    -2209019844, -2209019716, -2209018482, -2209018436, -2209018365, 
    -2209017376, -2209017340, -2209017319, -2209017054, -2209016900, 
    -2209016126, -2209014622, -2209013286, -2209012584, -2209009905, 
    -2209009208, -2209006827, -2209006663, -2208990872, -2209020164, 
    -2209015899, -2209013965, -2209013933, -2209011963, -2209010443, 
    -2209010351, -2209008868, -2209007569, -2209063141, -2209063059, 
    -2209062882, -2209062852, -2209054720, -2209054349, -2209050324, 
    -2209049810, -2209047902, -2209041612, -2209039205, -2209038444, 
    -2209038393, -2209038219, -2209037598, -2209037562, -2209037497, 
    -2209037082, -2209036943, -2209036795, -2209036404, -2209034846, 
    -2209032324, -2209032289, -2209031999, -2209031958, -2209030309, 
    -2209029952, -2209023411, -2209022296, -2209021086, -2209020624, 
    -2209020221, -2209019575, -2209017996, -2209017794, -2209014135, 
    -2209011509, -2209009303, -2209007905, -2209007799, -2209007709, 
    -2209005139, -2209004957, -2208998695, -2208998233, -2208990008, 
    -2208989978), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    ID = c(540273, 540333, 540568, 540734, 540766, 540771, 540808, 
    540810, 540847, 541011, 541022, 541060, 541147, 541160, 541231, 
    541252, 541259, 541283, 541317, 541379, 541396, 541399, 541503, 
    541537, 541562, 541682, 541684, 541704, 541779, 541849, 541879, 
    541883, 542039, 542115, 542117, 542120, 542164, 542166, 542207, 
    542236, 542275, 542358, 542394, 542403, 542414, 542457, 542515, 
    542522, 542579, 542598, 542741, 542749, 542772, 542786, 542825, 
    542831, 542854, 542934, 542975, 543003, 543004, 543005, 543044, 
    543109, 543111, 543156, 543158, 543159, 543162, 543164, 540161, 
    540187, 540230, 540231, 540233, 540322, 540324, 540329, 540344, 
    540384, 540468, 540477, 540480, 540482, 540483, 540485, 540486, 
    540522, 540526, 540683, 540820, 540876, 540880, 540917, 540918, 
    540927, 540934, 540935, 540989, 541005, 541014, 541034, 541114, 
    541146, 541163, 541276, 541371, 541646, 541653, 541658, 541678, 
    541725, 541738, 541892, 541895, 541916, 542015, 542021, 542028, 
    542080, 542084, 542143, 542175, 542225, 542333, 542337, 542409, 
    542415, 542455, 542456, 542460, 542482, 542485, 542487, 542500, 
    542505, 542544, 542610, 542677, 542704, 542814, 542837, 542950, 
    542955, 543174, 540141, 540281, 540343, 540348, 540401, 540453, 
    540457, 540500, 540535, 540865, 540866, 540869, 540871, 541027, 
    541038, 541165, 541187, 541257, 541533, 541627, 541661, 541662, 
    541668, 541691, 541693, 541695, 541713, 541723, 541731, 541751, 
    541850, 541960, 541963, 541978, 541981, 542035, 542053, 542269, 
    542301, 542355, 542375, 542390, 542424, 542466, 542471, 542642, 
    542745, 542835, 542911, 542917, 542920, 543019, 543031, 543112, 
    543117, 543178, 543179), price = c(7190, 9200, 7170, 7170, 
    7170, 9170, 7170, 7170, 7170, 9170, 7170, 9170, 8330, 7170, 
    9170, 7170, 9170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 
    7170, 7170, 7170, 9170, 9170, 9170, 9170, 9170, 8330, 7170, 
    7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 9170, 
    7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 
    7170, 7170, 9170, 7170, 7160, 7160, 7160, 7160, 7160, 7160, 
    7160, 7160, 8330, 8330, 8330, 8330, 7190, 7190, 7190, 7190, 
    7190, 7190, 7190, 7190, 9200, 9200, 7190, 7190, 7190, 7190, 
    7190, 7190, 7190, 9200, 9200, 9170, 9170, 7170, 7170, 9170, 
    7170, 7170, 9170, 9170, 8330, 9170, 8330, 8330, 9170, 7170, 
    9170, 7170, 7170, 9170, 9170, 9170, 7170, 7170, 7170, 7170, 
    7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 7170, 9170, 
    9170, 9170, 7170, 7170, 7170, 9170, 9170, 7170, 7170, 7170, 
    7170, 7170, 7170, 8330, 7170, 9170, 7170, 9170, 7170, 7170, 
    9160, 7190, 9200, 7190, 7190, 14880, 9200, 9200, 9200, 7190, 
    7170, 7170, 7170, 7170, 7170, 7170, 8330, 7170, 9170, 7170, 
    9170, 9170, 9170, 9170, 7170, 7170, 7170, 9170, 7170, 7170, 
    7170, 7170, 7170, 7170, 9170, 9170, 9170, 7170, 7170, 7170, 
    9170, 9170, 9170, 9170, 7170, 7170, 7170, 8330, 7170, 7170, 
    7170, 7170, 7160, 7160, 9160, 7160, 9160, 9160)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -202L))

首先,我假设您的数据在变量 df 中。 您必须仔细考虑要计算的内容。 如果你想要不分天数的价格变化,你可以这样做:

df %>% group_by(item) %>% 
  summarise(Price_changed_over_all_days = 
      sum((lead(price) - price)!=0, na.rm = TRUE))

#  A tibble: 3 x 2
#  item  Price_changed_over_all_days
#  <chr>                       <int>
#1 x                              24
#2 y                              30
#3 z                              24

然而,如果你想计算特定日期的价格变化次数,你会得到这样的结果:

df %>% group_by(item, bought_date) %>% 
      summarise(Price_changed_in_one_day = 
          sum((lead(price) - price)!=0, na.rm = TRUE))
#  A tibble: 6 x 3
#  Groups:   item [3]
#  item  bought_date         Price_changed_in_one_day
#  <chr> <dttm>                                 <int>
#1 x     2020-09-21 00:00:00                        3
#2 x     2020-09-22 00:00:00                       20
#3 y     2020-09-21 00:00:00                        4
#4 y     2020-09-22 00:00:00                       26
#5 z     2020-09-21 00:00:00                        5
#6 z     2020-09-22 00:00:00                       18

只是在这种情况下,您在摘要中有更多的行 table。 如果你只想要一个 table,你必须以某种方式 assemble 它并根据当天的值决定一些统计数据。也许平均值在这里合适?我不知道。

df %>% group_by(item) %>% 
  summarise(Price_changed_over_all_days = 
              sum((lead(price) - price)!=0, na.rm = TRUE)) %>% 
  left_join(
    df %>% group_by(item, bought_date) %>% 
      summarise(Price_changed_in_one_day = 
                  sum((lead(price) - price)!=0, na.rm = TRUE)) %>% 
      group_by(item) %>% 
      summarise(Price_changed_in_one_day = 
                  mean(Price_changed_in_one_day)
      ), by= "item")
#  A tibble: 3 x 3
#  item  Price_changed_over_all_days Price_changed_in_one_day
#  <chr>                       <int>                    <dbl>
#1 x                              24                     11.5
#2 y                              30                     15  
#3 z                              24                     11.5

另请注意,价格变化可能会在一天之内发生,因此给定产品几天内的变化总和不一定等于该产品所有价格变化的总和。在您的情况下,产品“x”就是这种情况。

对于 data.table 你可以使用 rleid:

library(data.table)
setDT(data)
data[,.(times=max(rleid(price))-1),by=.(item)]
#   item times
#1:    x    24
#2:    y    30
#3:    z    24

data[,.(timesday=max(rleid(price))-1),by=.(item,bought_date)]
#   item bought_date timesday
#1:    x  2020-09-21        3
#2:    x  2020-09-22       20
#3:    y  2020-09-21        4
#4:    y  2020-09-22       26
#5:    z  2020-09-21        5
#6:    z  2020-09-22       18