什么是用特定字符串值填充 NA 的 R 函数?
What is an R function for filling NA with specific string values?
我正在使用 IMDB 数据集,并试图找到填充空值的最佳解决方案,示例如下,
示例:
showTitle genres
Money Heist Action,Crime,Mystery
The Office Comedy
Money Heist NA
Breaking Bad Crime,Drama,Thriller
Money Heist Action,Crime,Mystery
Money Heist NA
The Office NA
期望的结果
showTitle genres
Money Heist Action,Crime,Mystery
The Office Comedy
Money Heist Action,Crime,Mystery
Breaking Bad Crime,Drama,Thriller
Money Heist Action,Crime,Mystery
Money Heist Action,Crime,Mystery
The Office NA
尝试过
df %>% if(df$showTitle == "Money Heist"){df$genres} <- Action,Crime,Mystery
如果没有 If 语句也能解决问题,只要不是手动更正单元格即可。
根据提供的详细信息,也许这种方法适合:
library(tidyverse)
# Create some 'fake' data
df <- data.frame(primaryTitle = c("Money Heist", "Money Heist", "Die Hard", "Die Hard", "Die Hard"),
genre = c("Action,Crime,Mystery", NA, "Action", "Action", NA))
df
#> primaryTitle genre
#> 1 Money Heist Action,Crime,Mystery
#> 2 Money Heist <NA>
#> 3 Die Hard Action
#> 4 Die Hard Action
#> 5 Die Hard <NA>
# Take the fake data
df %>%
# sort the data by title
arrange(primaryTitle) %>%
# if the genre is "NA" and the title == the previous title,
# fill in genre with the previous genre
mutate(genre = if_else(is.na(genre) & primaryTitle == lag(primaryTitle),
lag(genre),
genre))
#> primaryTitle genre
#> 1 Die Hard Action
#> 2 Die Hard Action
#> 3 Die Hard Action
#> 4 Money Heist Action,Crime,Mystery
#> 5 Money Heist Action,Crime,Mystery
由 reprex package (v2.0.0)
于 2021-08-16 创建
以你为例:
library(tidyverse)
df <- tibble::tribble(
~showTitle, ~genre,
"Money Heist", "Action,Crime,Mystery",
"Money Heist", NA,
"Breaking Bad", "Crime,Drama,Thriller",
"Money Heist", "Action,Crime,Mystery",
"Money Heist", NA,
"The Office", NA
)
df
#> # A tibble: 6 x 2
#> showTitle genre
#> <chr> <chr>
#> 1 Money Heist Action,Crime,Mystery
#> 2 Money Heist <NA>
#> 3 Breaking Bad Crime,Drama,Thriller
#> 4 Money Heist Action,Crime,Mystery
#> 5 Money Heist <NA>
#> 6 The Office <NA>
df %>%
arrange(showTitle) %>%
mutate(genre = if_else(is.na(genre) & showTitle == lag(showTitle),
lag(genre),
genre))
#> # A tibble: 6 x 2
#> showTitle genre
#> <chr> <chr>
#> 1 Breaking Bad Crime,Drama,Thriller
#> 2 Money Heist Action,Crime,Mystery
#> 3 Money Heist Action,Crime,Mystery
#> 4 Money Heist Action,Crime,Mystery
#> 5 Money Heist Action,Crime,Mystery
#> 6 The Office <NA>
由 reprex package (v2.0.0)
于 2021-08-16 创建
一个带碱基的衬垫并通过which()
、
索引子集
df[which(df$showTitle == 'Money Heist' & is.na(df$genre)), 'genre'] <- "Action,Crime,Mystery"
showTitle genre
1 Money Heist Action,Crime,Mystery
2 Money Heist Action,Crime,Mystery
3 Breaking Bad Crime,Drama,Thriller
4 Money Heist Action,Crime,Mystery
5 Money Heist Action,Crime,Mystery
6 The Office <NA>
这个回答是@DPH给的,不知道为什么删了
您可以使用 tidyr::fill
替换每个 showTitle
的 NA
值。
library(dplyr)
library(tidyr)
df %>%
group_by(showTitle) %>%
fill(genre, .direction = 'updown') %>%
ungroup
# showTitle genre
# <chr> <chr>
#1 Money Heist Action,Crime,Mystery
#2 Money Heist Action,Crime,Mystery
#3 Breaking Bad Crime,Drama,Thriller
#4 Money Heist Action,Crime,Mystery
#5 Money Heist Action,Crime,Mystery
#6 The Office Comedy
#7 The Office Comedy
我正在使用 IMDB 数据集,并试图找到填充空值的最佳解决方案,示例如下, 示例:
showTitle genres
Money Heist Action,Crime,Mystery
The Office Comedy
Money Heist NA
Breaking Bad Crime,Drama,Thriller
Money Heist Action,Crime,Mystery
Money Heist NA
The Office NA
期望的结果
showTitle genres
Money Heist Action,Crime,Mystery
The Office Comedy
Money Heist Action,Crime,Mystery
Breaking Bad Crime,Drama,Thriller
Money Heist Action,Crime,Mystery
Money Heist Action,Crime,Mystery
The Office NA
尝试过
df %>% if(df$showTitle == "Money Heist"){df$genres} <- Action,Crime,Mystery
如果没有 If 语句也能解决问题,只要不是手动更正单元格即可。
根据提供的详细信息,也许这种方法适合:
library(tidyverse)
# Create some 'fake' data
df <- data.frame(primaryTitle = c("Money Heist", "Money Heist", "Die Hard", "Die Hard", "Die Hard"),
genre = c("Action,Crime,Mystery", NA, "Action", "Action", NA))
df
#> primaryTitle genre
#> 1 Money Heist Action,Crime,Mystery
#> 2 Money Heist <NA>
#> 3 Die Hard Action
#> 4 Die Hard Action
#> 5 Die Hard <NA>
# Take the fake data
df %>%
# sort the data by title
arrange(primaryTitle) %>%
# if the genre is "NA" and the title == the previous title,
# fill in genre with the previous genre
mutate(genre = if_else(is.na(genre) & primaryTitle == lag(primaryTitle),
lag(genre),
genre))
#> primaryTitle genre
#> 1 Die Hard Action
#> 2 Die Hard Action
#> 3 Die Hard Action
#> 4 Money Heist Action,Crime,Mystery
#> 5 Money Heist Action,Crime,Mystery
由 reprex package (v2.0.0)
于 2021-08-16 创建以你为例:
library(tidyverse)
df <- tibble::tribble(
~showTitle, ~genre,
"Money Heist", "Action,Crime,Mystery",
"Money Heist", NA,
"Breaking Bad", "Crime,Drama,Thriller",
"Money Heist", "Action,Crime,Mystery",
"Money Heist", NA,
"The Office", NA
)
df
#> # A tibble: 6 x 2
#> showTitle genre
#> <chr> <chr>
#> 1 Money Heist Action,Crime,Mystery
#> 2 Money Heist <NA>
#> 3 Breaking Bad Crime,Drama,Thriller
#> 4 Money Heist Action,Crime,Mystery
#> 5 Money Heist <NA>
#> 6 The Office <NA>
df %>%
arrange(showTitle) %>%
mutate(genre = if_else(is.na(genre) & showTitle == lag(showTitle),
lag(genre),
genre))
#> # A tibble: 6 x 2
#> showTitle genre
#> <chr> <chr>
#> 1 Breaking Bad Crime,Drama,Thriller
#> 2 Money Heist Action,Crime,Mystery
#> 3 Money Heist Action,Crime,Mystery
#> 4 Money Heist Action,Crime,Mystery
#> 5 Money Heist Action,Crime,Mystery
#> 6 The Office <NA>
由 reprex package (v2.0.0)
于 2021-08-16 创建一个带碱基的衬垫并通过which()
、
df[which(df$showTitle == 'Money Heist' & is.na(df$genre)), 'genre'] <- "Action,Crime,Mystery"
showTitle genre
1 Money Heist Action,Crime,Mystery
2 Money Heist Action,Crime,Mystery
3 Breaking Bad Crime,Drama,Thriller
4 Money Heist Action,Crime,Mystery
5 Money Heist Action,Crime,Mystery
6 The Office <NA>
这个回答是@DPH给的,不知道为什么删了
您可以使用 tidyr::fill
替换每个 showTitle
的 NA
值。
library(dplyr)
library(tidyr)
df %>%
group_by(showTitle) %>%
fill(genre, .direction = 'updown') %>%
ungroup
# showTitle genre
# <chr> <chr>
#1 Money Heist Action,Crime,Mystery
#2 Money Heist Action,Crime,Mystery
#3 Breaking Bad Crime,Drama,Thriller
#4 Money Heist Action,Crime,Mystery
#5 Money Heist Action,Crime,Mystery
#6 The Office Comedy
#7 The Office Comedy