R：子设置基于两列的列

Question

我有一个名为 mat.new 的数据框。以下是生成此数据的方法：

      library(dplyr)

      year <- rep(1980:2015, each = 365) 
      doy <- rep(1:365, times = 36)

      set.seed(125) 
      val <- sample(0:1, size = 365*36,replace = TRUE) 
      mat <- as.matrix(cbind(year,doy,val))
      mat <- as.data.frame(mat)
      mat <- mat %>% 
              mutate(doy1 = rep(1:730, times = 18))
      mat <- mat[,c(1:2,4,3)]

      set.seed(123) 
      mat1 <- apply(matrix(sample(c(230:365), replace = TRUE, size = 2L * 36L), nrow = 36L), 2L, sort)
      mat1 <- t(apply(mat1, 1, function(x) x[order(x)]))
      colnames(mat1) <- c("D1", "D2")
      mat1 <- cbind(year = 1980:2015, mat1)
      mat1 <- as.data.frame(mat1)

      mat1[1:6,3] <- 5:10

      mat1 <- mat1 %>%
                mutate(D2 = ifelse(D1 > D2, D2 + 365, D2))

      mat_new <- mat %>% 
                 left_join(mat1, by = "year")

mat_new 有六列。第 1 列 = 年，第 2 列：doy（每年 365 天），第 3 列 = doy1 但从 1 到 730（2 年）并再次从 1 到 730 重复。第 4 列有一些值 (val)，第 5 列和第 6 列有特定的开始日期 (D1) 和每年的结束日期 (D2)。如果 D2 > 365，这意味着结束日期是明年。例如对于 1980 年，结束日期是 370，即 1981 年的第 5 天，

我需要根据每年的开始和结束日期对 val 进行子集化。例如，对于 1980 年，我需要子集化的 val 应该从 1980 年的 233 开始到 1981 年的 5 日（370 是结束日期）。我想先用 true 和 false 创建另一个列，然后我可以用它来子集 val

      mat_new1 <- mat_new %>% 
                    mutate(group1 = ifelse(D2 <= 365, doy >= D1 & doy <= D2 , doy >= D1 & doy1 <= D2))

上面的行应该创建另一个包含 TRUE 和 FALSE 的列 group1。如果 D2 <= 365 即结束日期在同一年内，请使用 doy 列对 D1 进行子集化直到 D2。但是，如果 D2 是在下一年 (D2 > 365)，然后使用 doy 作为开始日期并从 doy1 列中获取结束日期。上述功能，但是对于 1980 年（和其他年份）仅将 TRUE 从 D1 开始，但在 1980 年以 365 结束，而不是转到 1981 年 1 月 5 日（从 doy1 开始的 370）

我做错了什么？

Answer 1

这是一个选项。这个想法是根据 D1 和 D2 筛选同一年的天数的数据框，然后过滤下一年的天数。为此，调整 D2 以计算下一年有多少天，因此此方法需要两个查找表。 mat_new3 是最终输出。

顺便说一句，有些年份是闰年，所以它们有 366 天。您似乎假设所有年份都有 365 天。只是想确保您了解这一点，这不会影响您的分析。

# Look-up table for the same year
mat_day <- mat_new %>% 
  select(year, D1, D2) %>%
  distinct() %>%
  # Create a column D_next to show how many days are in the next year
  # After that, update D2 to only ended in 365 if D_next > 365
  mutate(D_next = ifelse(D2 > 365, D2 - 365, 0),
         D2 = D2 - D_next)

# Look-up table for the next year
mat_day_next <- mat_day %>%
  # Update the year column to represent the next year
  mutate(year = year + 1) %>%
  # Remove year if it is larger than the maximum of the original year
  filter(year <= max(mat_day$year)) %>%
  # Remove D_next == 0
  filter(D_next != 0) %>%
  # Remove D1 and D2
  select(-D1, -D2) %>%
  # Create a column showing the beginning day of the next year
  mutate(D1 = 1, D2 = D_next)

# Filter rows for the same year  
mat_new1 <- mat_new %>%
  # Join with may_day by year
  left_join(mat_day, by = c("year")) %>%
  group_by(year) %>%
  # Filter by D1.y and D2.y (D1 and D2 from mat_day)
  filter(doy >= D1.y & doy <= D2.y) %>%
  ungroup()

# Filter rows for the next year
mat_new2 <- mat_new %>%
  # Join with may_day_next by year
  left_join(mat_day_next, by = c("year")) %>%
  group_by(year) %>%
  # Filter by D1.y and D2.y (D1 and D2 from mat_day_next)
  filter(doy >= D1.y & doy <= D2.y) %>%
  ungroup()

# Combine the results 
mat_new3 <- bind_rows(mat_new1, mat_new2) %>%
  arrange(year, doy, doy1) %>%
  select(-D1.y, -D2.y, -D_next) %>%
  rename(D1 = D1.x, D2 = D2.x) %>%
  ungroup()

# View the first 6 rows from the year 1980
mat_new3 %>% head()
# # A tibble: 6 x 6
#    year   doy  doy1   val    D1    D2
#   <dbl> <int> <int> <int> <int> <dbl>
# 1  1980   233   233     0   233   370
# 2  1980   234   234     1   233   370
# 3  1980   235   235     0   233   370
# 4  1980   236   236     0   233   370
# 5  1980   237   237     0   233   370
# 6  1980   238   238     1   233   370

# View the last 10 rows from the year 1980
mat_new3 %>%
  slice(1:(370 - 233 + 1)) %>%
  tail(10)
# # A tibble: 10 x 6
#     year   doy  doy1   val    D1    D2
#    <dbl> <int> <int> <int> <int> <dbl>
#  1  1980   361   361     0   233   370
#  2  1980   362   362     1   233   370
#  3  1980   363   363     0   233   370
#  4  1980   364   364     0   233   370
#  5  1980   365   365     1   233   370
#  6  1981     1   366     0   235   371
#  7  1981     2   367     1   235   371
#  8  1981     3   368     0   235   371
#  9  1981     4   369     1   235   371
# 10  1981     5   370     0   235   371

R：子设置基于两列的列

R: sub setting a column based on two columns

r

plyr

dplyr

data.table

tidyverse