使用 R 获取该范围内的范围和随机天数
Get the range and random days within that range using R
我有一个如下所示的数据框
test_df <- data.frame("subbject_id" = c(1,2,3,4,5),
"date_1" = c("01/01/2003","12/31/2007","12/30/2008","01/02/2007","01/01/2007"))
test_df = test_df %>%
mutate(date_1 = mdy(date_1),
previous_year = floor_date(date_1, 'year'),
next_year = ceiling_date(date_1, 'year') - 1,
days_to_previous_year = as.integer(date_1 - previous_year),
days_to_next_year = as.integer(next_year - date_1),
rand_days_prev_year = sample.int(days_to_previous_year, 1),
rand_days_next_year = sample.int(days_to_next_year, 1)) %>%
select(-previous_year, -next_year)
多亏了这个,它帮助我用代码到达了解决方案的一部分。
我想做两件事
a) 使用 days_to_prev_year
和 days_to_next_year
获取值的范围。请注意,days_to_prev_year
前面必须有 minus
符号,如输出所示。
b) 在该范围内选择一个随机值。请注意,如果范围是 [0,364]
,我希望随机值介于 [1,364]
之间(包括 )。我不希望 0
作为随机值。所以,我想避免 0
被选为随机值。同样,如果是[-11,21]
。我也不想在这里选择 0,但是兰特值可以是 -11
或 21
.
我尝试了下面的语句,但它不起作用
range = paste0("[-",days_to_previous_year,",+",days_to_next_year,"]")
test_df$rand_days = sample.int(test_df$range, 1) # error as non-numeric
所以,我尝试使用以下两个数字列
test_df$rand_days_prev_year = sample.int(test_df$days_to_previous_year, 1) # this doesn't work
test_df$rand_days_next_year = sample.int(test_df$days_to_next_year, 1) # but this works
我收到如下所示的错误消息
Error in if (useHash) .Internal(sample2(n, size)) else .Internal(sample(n, :
missing value where TRUE/FALSE needed
我希望我的输出如下所示
这是一种方法:
library(dplyr)
test_df %>%
mutate(range = sprintf("%d, %d", -days_to_previous_year, days_to_next_year)) %>%
rowwise() %>%
mutate(rand_days = {days = -days_to_previous_year:days_to_next_year;
days = days[days != 0]
if(length(days)) sample(days, 1) else NA
})
# subbject_id date_1 days_to_previous_year days_to_next_year range rand_days
# <dbl> <date> <int> <int> <chr> <int>
#1 1 2003-01-01 0 364 0, 364 206
#2 2 2007-12-31 364 0 -364, 0 -220
#3 3 2008-12-30 364 1 -364, 1 -274
#4 4 2007-01-02 1 363 -1, 363 228
#5 5 2007-01-01 0 364 0, 364 72
我有一个如下所示的数据框
test_df <- data.frame("subbject_id" = c(1,2,3,4,5),
"date_1" = c("01/01/2003","12/31/2007","12/30/2008","01/02/2007","01/01/2007"))
test_df = test_df %>%
mutate(date_1 = mdy(date_1),
previous_year = floor_date(date_1, 'year'),
next_year = ceiling_date(date_1, 'year') - 1,
days_to_previous_year = as.integer(date_1 - previous_year),
days_to_next_year = as.integer(next_year - date_1),
rand_days_prev_year = sample.int(days_to_previous_year, 1),
rand_days_next_year = sample.int(days_to_next_year, 1)) %>%
select(-previous_year, -next_year)
多亏了这个
我想做两件事
a) 使用 days_to_prev_year
和 days_to_next_year
获取值的范围。请注意,days_to_prev_year
前面必须有 minus
符号,如输出所示。
b) 在该范围内选择一个随机值。请注意,如果范围是 [0,364]
,我希望随机值介于 [1,364]
之间(包括 )。我不希望 0
作为随机值。所以,我想避免 0
被选为随机值。同样,如果是[-11,21]
。我也不想在这里选择 0,但是兰特值可以是 -11
或 21
.
我尝试了下面的语句,但它不起作用
range = paste0("[-",days_to_previous_year,",+",days_to_next_year,"]")
test_df$rand_days = sample.int(test_df$range, 1) # error as non-numeric
所以,我尝试使用以下两个数字列
test_df$rand_days_prev_year = sample.int(test_df$days_to_previous_year, 1) # this doesn't work
test_df$rand_days_next_year = sample.int(test_df$days_to_next_year, 1) # but this works
我收到如下所示的错误消息
Error in if (useHash) .Internal(sample2(n, size)) else .Internal(sample(n, :
missing value where TRUE/FALSE needed
我希望我的输出如下所示
这是一种方法:
library(dplyr)
test_df %>%
mutate(range = sprintf("%d, %d", -days_to_previous_year, days_to_next_year)) %>%
rowwise() %>%
mutate(rand_days = {days = -days_to_previous_year:days_to_next_year;
days = days[days != 0]
if(length(days)) sample(days, 1) else NA
})
# subbject_id date_1 days_to_previous_year days_to_next_year range rand_days
# <dbl> <date> <int> <int> <chr> <int>
#1 1 2003-01-01 0 364 0, 364 206
#2 2 2007-12-31 364 0 -364, 0 -220
#3 3 2008-12-30 364 1 -364, 1 -274
#4 4 2007-01-02 1 363 -1, 363 228
#5 5 2007-01-01 0 364 0, 364 72