添加一个包含 NAs 和 Mean 计数的列
Add a column with count of NAs and Mean
我有一个数据框,我需要向其中添加另一列,显示该行所有其他列中的 NA 计数以及非 NA 值的平均值。
我认为可以在dplyr中完成。
> df1 <- data.frame(a = 1:5, b = c(1,2,NA,4,NA), c = c(NA,2,3,NA,NA))
> df1
a b c
1 1 1 NA
2 2 2 2
3 3 NA 3
4 4 4 NA
5 5 NA NA
我想改变另一列计算该行中 NA 的数量,另一列显示该行中所有非 NA 值的平均值。
你可以试试这个:
#Find the row mean and add it to a new column in the dataframe
df1$Mean <- rowMeans(df1, na.rm = TRUE)
#Find the count of NA and add it to a new column in the dataframe
df1$CountNa <- rowSums(apply(is.na(df1), 2, as.numeric))
library(dplyr)
count_na <- function(x) sum(is.na(x))
df1 %>%
mutate(means = rowMeans(., na.rm = T),
count_na = apply(., 1, count_na))
#### ANSWER FOR RADEK ####
elected_cols <- c('b', 'c')
df1 %>%
mutate(means = rowMeans(.[elected_cols], na.rm = T),
count_na = apply(.[elected_cols], 1, count_na))
如此处所述
df1 <- data.frame(a = 1:5, b = c(1,2,NA,4,NA), c = c(NA,2,3,NA,NA))
df1 %>%
mutate(means = rowMeans(., na.rm = T),
count_na = rowSums(is.na(.)))
处理选定的列(此处的示例适用于列 a 和列 c):
df1 %>%
mutate(means = rowMeans(., na.rm = T),
count_na = rowSums(is.na(select(.,one_of(c('a','c'))))))
我最近遇到了这个问题的变体,我需要计算完整值的百分比,但针对特定变量(不是所有变量)。这是一种对我有用的方法。
df1 %>%
# create dummy variables representing if the observation is missing ----
# can modify here for specific variables ----
mutate_all(list(dummy = is.na)) %>%
# compute a row wise sum of missing ----
rowwise() %>%
mutate(
# number of missing observations ----
n_miss = sum(c_across(matches("_dummy"))),
# percent of observations that are complete (non-missing) ----
pct_complete = 1 - mean(c_across(matches("_dummy")))
) %>%
# remove grouping from rowwise ----
ungroup() %>%
# remove dummy variables ----
dplyr::select(-matches("dummy"))
我有一个数据框,我需要向其中添加另一列,显示该行所有其他列中的 NA 计数以及非 NA 值的平均值。 我认为可以在dplyr中完成。
> df1 <- data.frame(a = 1:5, b = c(1,2,NA,4,NA), c = c(NA,2,3,NA,NA))
> df1
a b c
1 1 1 NA
2 2 2 2
3 3 NA 3
4 4 4 NA
5 5 NA NA
我想改变另一列计算该行中 NA 的数量,另一列显示该行中所有非 NA 值的平均值。
你可以试试这个:
#Find the row mean and add it to a new column in the dataframe
df1$Mean <- rowMeans(df1, na.rm = TRUE)
#Find the count of NA and add it to a new column in the dataframe
df1$CountNa <- rowSums(apply(is.na(df1), 2, as.numeric))
library(dplyr)
count_na <- function(x) sum(is.na(x))
df1 %>%
mutate(means = rowMeans(., na.rm = T),
count_na = apply(., 1, count_na))
#### ANSWER FOR RADEK ####
elected_cols <- c('b', 'c')
df1 %>%
mutate(means = rowMeans(.[elected_cols], na.rm = T),
count_na = apply(.[elected_cols], 1, count_na))
如此处所述
df1 <- data.frame(a = 1:5, b = c(1,2,NA,4,NA), c = c(NA,2,3,NA,NA))
df1 %>%
mutate(means = rowMeans(., na.rm = T),
count_na = rowSums(is.na(.)))
处理选定的列(此处的示例适用于列 a 和列 c):
df1 %>%
mutate(means = rowMeans(., na.rm = T),
count_na = rowSums(is.na(select(.,one_of(c('a','c'))))))
我最近遇到了这个问题的变体,我需要计算完整值的百分比,但针对特定变量(不是所有变量)。这是一种对我有用的方法。
df1 %>%
# create dummy variables representing if the observation is missing ----
# can modify here for specific variables ----
mutate_all(list(dummy = is.na)) %>%
# compute a row wise sum of missing ----
rowwise() %>%
mutate(
# number of missing observations ----
n_miss = sum(c_across(matches("_dummy"))),
# percent of observations that are complete (non-missing) ----
pct_complete = 1 - mean(c_across(matches("_dummy")))
) %>%
# remove grouping from rowwise ----
ungroup() %>%
# remove dummy variables ----
dplyr::select(-matches("dummy"))