为同一组内的所有 obs 填充 NA 值
Filling NA values for all obs within the same group
我有一个医疗数据,其中某些条件指标(即列)仅适用于某些行,但实际上,相同条件应明确应用于属于同一治疗的所有观察结果(即 program
).因此,填充 NA 看起来很简单(因为它们都被假定为具有相同的值)但也并不容易,因为当我应用一些以前的线程推荐的方法时(例如, and here),它们似乎有问题填充字符串值,如下代码所示
有解决办法吗?
df_example <- data.frame(patient = c("A", "B", "C", "A", "B", "C", "A", "B", "C"),
status = c("Active", NA, NA, NA, "Non-Active", NA, NA, NA, "Active"),
condition = c(NA, "I", NA, NA, "II", "II", NA, NA, "III"),
program = c(1, 1, 1, 2, 2, 2, 3, 3, 3))
# I want to fill all the NA cells for columns "status" and "condition" by each program, the values should be the same for obs belonging to the same program
library("dplyr")
library("zoo")
df_example %>% group_by(program) %>% transmute(status=na.locf(status, na.rm=FALSE))
# A tibble: 9 x 2
# Groups: program [3]
program status
<dbl> <fct>
1 1 Active
2 1 Active
3 1 Active
4 2 NA
5 2 Non-Active
6 2 Non-Active
7 3 NA
8 3 NA
9 3 Active
您还需要添加 na.locf
和 fromLast
参数,即
library(dplyr)
library(zoo)
df_example %>%
group_by(program) %>%
transmute(status = na.locf(status, na.rm = FALSE),
status = na.locf(status, fromLast = TRUE))
# A tibble: 9 x 2
# Groups: program [3]
# program status
# <dbl> <fct>
#1 1 Active
#2 1 Active
#3 1 Active
#4 2 Non-Active
#5 2 Non-Active
#6 2 Non-Active
#7 3 Active
#8 3 Active
#9 3 Active
假设每组中只有一个非NA:
df_example %>%
group_by(program) %>%
transmute(status = na.omit(status)) %>%
ungroup
或者如果有多个非NA但所有的非NA都相同:
df_example %>%
group_by(program) %>%
transmute(status = first(na.omit(status))) %>%
ungroup
给予:
# A tibble: 9 x 2
program status
<dbl> <fct>
1 1 Active
2 1 Active
3 1 Active
4 2 Non-Active
5 2 Non-Active
6 2 Non-Active
7 3 Active
8 3 Active
9 3 Active
我有一个医疗数据,其中某些条件指标(即列)仅适用于某些行,但实际上,相同条件应明确应用于属于同一治疗的所有观察结果(即 program
).因此,填充 NA 看起来很简单(因为它们都被假定为具有相同的值)但也并不容易,因为当我应用一些以前的线程推荐的方法时(例如,
有解决办法吗?
df_example <- data.frame(patient = c("A", "B", "C", "A", "B", "C", "A", "B", "C"),
status = c("Active", NA, NA, NA, "Non-Active", NA, NA, NA, "Active"),
condition = c(NA, "I", NA, NA, "II", "II", NA, NA, "III"),
program = c(1, 1, 1, 2, 2, 2, 3, 3, 3))
# I want to fill all the NA cells for columns "status" and "condition" by each program, the values should be the same for obs belonging to the same program
library("dplyr")
library("zoo")
df_example %>% group_by(program) %>% transmute(status=na.locf(status, na.rm=FALSE))
# A tibble: 9 x 2
# Groups: program [3]
program status
<dbl> <fct>
1 1 Active
2 1 Active
3 1 Active
4 2 NA
5 2 Non-Active
6 2 Non-Active
7 3 NA
8 3 NA
9 3 Active
您还需要添加 na.locf
和 fromLast
参数,即
library(dplyr)
library(zoo)
df_example %>%
group_by(program) %>%
transmute(status = na.locf(status, na.rm = FALSE),
status = na.locf(status, fromLast = TRUE))
# A tibble: 9 x 2
# Groups: program [3]
# program status
# <dbl> <fct>
#1 1 Active
#2 1 Active
#3 1 Active
#4 2 Non-Active
#5 2 Non-Active
#6 2 Non-Active
#7 3 Active
#8 3 Active
#9 3 Active
假设每组中只有一个非NA:
df_example %>%
group_by(program) %>%
transmute(status = na.omit(status)) %>%
ungroup
或者如果有多个非NA但所有的非NA都相同:
df_example %>%
group_by(program) %>%
transmute(status = first(na.omit(status))) %>%
ungroup
给予:
# A tibble: 9 x 2
program status
<dbl> <fct>
1 1 Active
2 1 Active
3 1 Active
4 2 Non-Active
5 2 Non-Active
6 2 Non-Active
7 3 Active
8 3 Active
9 3 Active