什么是用特定字符串值填充 NA 的 R 函数？

Question

我正在使用 IMDB 数据集，并试图找到填充空值的最佳解决方案，示例如下，示例：

showTitle        genres
Money Heist      Action,Crime,Mystery
The Office       Comedy
Money Heist      NA
Breaking Bad     Crime,Drama,Thriller
Money Heist      Action,Crime,Mystery
Money Heist      NA
The Office       NA

期望的结果

showTitle        genres
    Money Heist      Action,Crime,Mystery
    The Office       Comedy
    Money Heist      Action,Crime,Mystery
    Breaking Bad     Crime,Drama,Thriller
    Money Heist      Action,Crime,Mystery
    Money Heist      Action,Crime,Mystery
    The Office       NA

尝试过

df %>% if(df$showTitle == "Money Heist"){df$genres} <- Action,Crime,Mystery

如果没有 If 语句也能解决问题，只要不是手动更正单元格即可。

Answer 1

根据提供的详细信息，也许这种方法适合：

library(tidyverse)
# Create some 'fake' data
df <- data.frame(primaryTitle = c("Money Heist", "Money Heist", "Die Hard", "Die Hard", "Die Hard"),
                 genre = c("Action,Crime,Mystery", NA, "Action", "Action", NA))
df
#>   primaryTitle                genre
#> 1  Money Heist Action,Crime,Mystery
#> 2  Money Heist                 <NA>
#> 3     Die Hard               Action
#> 4     Die Hard               Action
#> 5     Die Hard                 <NA>

# Take the fake data
df %>%
  # sort the data by title
  arrange(primaryTitle) %>%
  # if the genre is "NA" and the title == the previous title,
  # fill in genre with the previous genre
  mutate(genre = if_else(is.na(genre) & primaryTitle == lag(primaryTitle),
                        lag(genre),
                        genre))
#>   primaryTitle                genre
#> 1     Die Hard               Action
#> 2     Die Hard               Action
#> 3     Die Hard               Action
#> 4  Money Heist Action,Crime,Mystery
#> 5  Money Heist Action,Crime,Mystery

^{由 reprex package (v2.0.0)}

于 2021-08-16 创建

以你为例：

library(tidyverse)
df <- tibble::tribble(
  ~showTitle, ~genre,
  "Money Heist",      "Action,Crime,Mystery",
  "Money Heist",      NA,
  "Breaking Bad",     "Crime,Drama,Thriller",
  "Money Heist",      "Action,Crime,Mystery",
  "Money Heist",      NA,
  "The Office",       NA
)


df
#> # A tibble: 6 x 2
#>   showTitle    genre               
#>   <chr>        <chr>               
#> 1 Money Heist  Action,Crime,Mystery
#> 2 Money Heist  <NA>                
#> 3 Breaking Bad Crime,Drama,Thriller
#> 4 Money Heist  Action,Crime,Mystery
#> 5 Money Heist  <NA>                
#> 6 The Office   <NA>

df %>%
  arrange(showTitle) %>%
  mutate(genre = if_else(is.na(genre) & showTitle == lag(showTitle),
                        lag(genre),
                        genre))
#> # A tibble: 6 x 2
#>   showTitle    genre               
#>   <chr>        <chr>               
#> 1 Breaking Bad Crime,Drama,Thriller
#> 2 Money Heist  Action,Crime,Mystery
#> 3 Money Heist  Action,Crime,Mystery
#> 4 Money Heist  Action,Crime,Mystery
#> 5 Money Heist  Action,Crime,Mystery
#> 6 The Office   <NA>

^{由 reprex package (v2.0.0)}

于 2021-08-16 创建

Answer 2

一个带碱基的衬垫并通过which()、

索引子集

df[which(df$showTitle == 'Money Heist' & is.na(df$genre)), 'genre'] <- "Action,Crime,Mystery"
     showTitle                  genre
1  Money Heist   Action,Crime,Mystery
2  Money Heist   Action,Crime,Mystery
3  Breaking Bad  Crime,Drama,Thriller
4  Money Heist   Action,Crime,Mystery
5  Money Heist   Action,Crime,Mystery
6  The Office    <NA>

Answer 3

这个回答是@DPH给的，不知道为什么删了

您可以使用 tidyr::fill 替换每个 showTitle 的 NA 值。

library(dplyr)
library(tidyr)

df %>%
  group_by(showTitle) %>%
  fill(genre, .direction = 'updown') %>%
  ungroup

#  showTitle    genre               
#  <chr>        <chr>               
#1 Money Heist  Action,Crime,Mystery
#2 Money Heist  Action,Crime,Mystery
#3 Breaking Bad Crime,Drama,Thriller
#4 Money Heist  Action,Crime,Mystery
#5 Money Heist  Action,Crime,Mystery
#6 The Office   Comedy              
#7 The Office   Comedy

什么是用特定字符串值填充 NA 的 R 函数？

What is an R function for filling NA with specific string values?

if-statement

r

na