根据条件填充缺失值
Filling missing values based on a condition
我正在处理一个纵向数据集,其中同一年有多个数据,但有时会丢失。因此,使用此数据:
id <- c(rep("1", 5), rep("2", 5), rep("3", 5))
year <- c(1999, 1999, 2000, 2001, 2001, 1999, 2000, 2001, 2001, 2001, 1999, 2000,
2001, 2002, 2003)
marstat <- c("married", NA, "married", "married", "divorced", "single", "single", "single", NA, NA, "married", NA, "married", "divorced", "divorced")
df <- data.frame(id , year , marstat)
id year marstat
1 1 1999 married
2 1 1999 NA
3 1 2000 married
4 1 2001 married
5 1 2001 divorced
6 2 1999 single
7 2 2000 single
8 2 2001 single
9 2 2001 NA
10 2 2001 NA
11 3 1999 married
12 3 2000 NA
13 3 2001 married
14 3 2002 divorced
15 3 2003 divorced
如果有关于那一年的婚姻状况的信息,我想用那个人的现有数据填充 NA。所以对于 ID 1,第 2 行有一个 NA,但同一年有那个人的数据,所以我希望它在那里说 "married"。同样对于 ID,第 9 行和第 10 行,它应该说 "single" 因为根据第 8 行的数据,此人在 2001 年是单身。
我不只是想删除缺失的行,因为在我的实际数据中我有更多的列。
我不想根据 previous/later 值来填充它。仅当年份相同。
你可以试试
library(tidyverse)
df %>%
group_by(id, year) %>%
mutate(marstat2=paste(na.omit(marstat), collapse = ","),
marstat3=case_when(is.na(marstat) ~ marstat2,
TRUE ~ as.character(marstat)))
# A tibble: 15 x 5
# Groups: id, year [11]
id year marstat marstat2 marstat3
<fct> <dbl> <fct> <chr> <chr>
1 1 1999. married married married
2 1 1999. NA married married
3 1 2000. married married married
4 1 2001. married married,divorced married
5 1 2001. divorced married,divorced divorced
6 2 1999. single single single
7 2 2000. single single single
8 2 2001. single single single
9 2 2001. NA single single
10 2 2001. NA single single
11 3 1999. married married married
12 3 2000. NA "" ""
13 3 2001. married married married
14 3 2002. divorced divorced divorced
15 3 2003. divorced divorced divorced
添加了不同的列来显示该方法的机会。
我正在处理一个纵向数据集,其中同一年有多个数据,但有时会丢失。因此,使用此数据:
id <- c(rep("1", 5), rep("2", 5), rep("3", 5))
year <- c(1999, 1999, 2000, 2001, 2001, 1999, 2000, 2001, 2001, 2001, 1999, 2000,
2001, 2002, 2003)
marstat <- c("married", NA, "married", "married", "divorced", "single", "single", "single", NA, NA, "married", NA, "married", "divorced", "divorced")
df <- data.frame(id , year , marstat)
id year marstat
1 1 1999 married
2 1 1999 NA
3 1 2000 married
4 1 2001 married
5 1 2001 divorced
6 2 1999 single
7 2 2000 single
8 2 2001 single
9 2 2001 NA
10 2 2001 NA
11 3 1999 married
12 3 2000 NA
13 3 2001 married
14 3 2002 divorced
15 3 2003 divorced
如果有关于那一年的婚姻状况的信息,我想用那个人的现有数据填充 NA。所以对于 ID 1,第 2 行有一个 NA,但同一年有那个人的数据,所以我希望它在那里说 "married"。同样对于 ID,第 9 行和第 10 行,它应该说 "single" 因为根据第 8 行的数据,此人在 2001 年是单身。
我不只是想删除缺失的行,因为在我的实际数据中我有更多的列。
我不想根据 previous/later 值来填充它。仅当年份相同。
你可以试试
library(tidyverse)
df %>%
group_by(id, year) %>%
mutate(marstat2=paste(na.omit(marstat), collapse = ","),
marstat3=case_when(is.na(marstat) ~ marstat2,
TRUE ~ as.character(marstat)))
# A tibble: 15 x 5
# Groups: id, year [11]
id year marstat marstat2 marstat3
<fct> <dbl> <fct> <chr> <chr>
1 1 1999. married married married
2 1 1999. NA married married
3 1 2000. married married married
4 1 2001. married married,divorced married
5 1 2001. divorced married,divorced divorced
6 2 1999. single single single
7 2 2000. single single single
8 2 2001. single single single
9 2 2001. NA single single
10 2 2001. NA single single
11 3 1999. married married married
12 3 2000. NA "" ""
13 3 2001. married married married
14 3 2002. divorced divorced divorced
15 3 2003. divorced divorced divorced
添加了不同的列来显示该方法的机会。