替换列 r 中的多个值
Replacing multipe values in a column r
我正在尝试创建一个函数,该函数接受两个变量,大陆和要从数据框中使用的列。然后,我试图计算该特定大陆的列的平均值,以替换该大陆该列中的 NA。但是,在实际替换值时,我似乎遇到了麻烦,我一直 运行 出错。我尝试了多种方法,例如替换、replace_na 和变异,但我不断收到似乎无法摆脱的错误。这段代码在不在函数中时有效,但是当我将它添加到函数中时,我似乎得到了这个错误。
df<-structure(list(location = c("Algeria", "Angola", "Benin", "Botswana",
"Burkina Faso", "Burundi"), iso_code = c("DZA", "AGO", "BEN",
"BWA", "BFA", "BDI"), continent = c("Africa", "Africa", "Africa",
"Africa", "Africa", "Africa"), date = c("2020-09-02", "2020-09-02",
"2020-09-02", "2020-09-02", "2020-09-02", "2020-09-02"), total_cases = c(44833,
2654, 2145, 1733, 1375, 445), new_cases = c(339, 30, 0, 9, 5,
0), new_cases_smoothed = c(372.143, 53, 4.286, 24.429, 3.286,
2.143), total_deaths = c(1518, 108, 40, 6, 55, 1), new_deaths = c(8,
1, 0, 0, 0, 0), new_deaths_smoothed = c(8.857, 0.857, 0.143,
0.429, 0, 0), total_cases_per_million = c(1022.393, 80.751, 176.934,
736.937, 65.779, 37.424), new_cases_per_million = c(7.731, 0.913,
0, 3.827, 0.239, 0), new_cases_smoothed_per_million = c(8.487,
1.613, 0.354, 10.388, 0.157, 0.18), total_deaths_per_million = c(34.617,
3.286, 3.299, 2.551, 2.631, 0.084), new_deaths_per_million = c(0.182,
0.03, 0, 0, 0, 0), new_deaths_smoothed_per_million = c(0.202,
0.026, 0.012, 0.182, 0, 0), population = c(43851043, 32866268,
12123198, 2351625, 20903278, 11890781), population_density = c(17.348,
23.89, 99.11, 4.044, 70.151, 423.062), median_age = c(29.1, 16.8,
18.8, 25.8, 17.6, 17.5), aged_65_older = c(6.211, 2.405, 3.244,
3.941, 2.409, 2.562), aged_70_older = c(3.857, 1.362, 1.942,
2.242, 1.358, 1.504), gdp_per_capita = c(13913.839, 5819.495,
2064.236, 15807.374, 1703.102, 702.225), extreme_poverty = c(0.5,
NA, 49.6, NA, 43.7, 71.7), cardiovasc_death_rate = c(278.364,
276.045, 235.848, 237.372, 269.048, 293.068), diabetes_prevalence = c(6.73,
3.94, 0.99, 4.81, 2.42, 6.05), female_smokers = c(0.7, NA, 0.6,
5.7, 1.6, NA), male_smokers = c(30.4, NA, 12.3, 34.4, 23.9, NA
), handwashing_facilities = c(83.741, 26.664, 11.035, NA, 11.877,
6.144), hospital_beds_per_thousand = c(1.9, NA, 0.5, 1.8, 0.4,
0.8), life_expectancy = c(76.88, 61.15, 61.77, 69.59, 61.58,
61.58)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
fun1 <- function(cont, column)
{
countries<-df%>%
filter(continent == cont)
m<-mean(countries[[column]],na.rm=T)
df[,column]<-ifelse(is.na(df[,column]) & df$continent==cont,m,(df[,column]=df[,column]))
}
fun1("Europe","median_age")
错误:
总结期间出错:无法将大小为 208 的输入回收到大小 1。
错误:没有更多可用的错误处理程序(递归错误?);正在调用 'abort' 重新启动
你这里有很多问题。第一个是你似乎在复制你的输入时出错了,所以你的示例代码没有 运行。其次,你在函数中使用了名字mean
作为变量名,这很可能会导致后面的调试混乱。第三个是你的函数没有 return 任何东西。最后,您的间距使代码很难阅读。您有很多带有换行符的垂直空格,但不要用空格分隔变量名和运算符。这再次使事情更难调试。
如果您使用的是 dplyr 函数,则可以利用准引用使您的代码更简单、使用起来更直观。例如,您可以编写它来传递裸列名称,而不必将它们用“双引号”
括起来
fun1 <- function(cont, col)
{
col <- enquo(col)
filter(df, continent == cont) %>%
mutate(!!col := replace(!!col, is.na(!!col), mean(!!col, na.rm = TRUE)))
}
所以你可以这样写:
fun1("Africa", new_cases)
#> location iso_code continent date total_cases new_cases new_cases_smoothed
#> 1 Algeria DZA Africa 2020-09-02 44833 339 372.143
#> 2 Angola AGO Africa 2020-09-02 2654 30 53.000
#> 3 Benin BEN Africa 2020-09-02 2145 0 4.286
#> 4 Botswana BWA Africa 2020-09-02 1733 9 24.429
#> 5 Burkina Faso BFA Africa 2020-09-02 1375 5 3.286
#> 6 Burundi BDI Africa 2020-09-02 445 0 2.143
#> total_deaths new_deaths
#> 1 1518 8
#> 2 108 1
#> 3 40 0
#> 4 6 0
#> 5 55 0
#> 6 1 0
如果您只想用该大陆其他国家/地区的平均值替换数字列中的所有 NA 值,那么您根本不需要函数。您可以只使用:
df <- df %>%
group_by(continent) %>%
mutate(across(total_cases:life_expectancy,
function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))))
转换整个数据框。
我正在尝试创建一个函数,该函数接受两个变量,大陆和要从数据框中使用的列。然后,我试图计算该特定大陆的列的平均值,以替换该大陆该列中的 NA。但是,在实际替换值时,我似乎遇到了麻烦,我一直 运行 出错。我尝试了多种方法,例如替换、replace_na 和变异,但我不断收到似乎无法摆脱的错误。这段代码在不在函数中时有效,但是当我将它添加到函数中时,我似乎得到了这个错误。
df<-structure(list(location = c("Algeria", "Angola", "Benin", "Botswana",
"Burkina Faso", "Burundi"), iso_code = c("DZA", "AGO", "BEN",
"BWA", "BFA", "BDI"), continent = c("Africa", "Africa", "Africa",
"Africa", "Africa", "Africa"), date = c("2020-09-02", "2020-09-02",
"2020-09-02", "2020-09-02", "2020-09-02", "2020-09-02"), total_cases = c(44833,
2654, 2145, 1733, 1375, 445), new_cases = c(339, 30, 0, 9, 5,
0), new_cases_smoothed = c(372.143, 53, 4.286, 24.429, 3.286,
2.143), total_deaths = c(1518, 108, 40, 6, 55, 1), new_deaths = c(8,
1, 0, 0, 0, 0), new_deaths_smoothed = c(8.857, 0.857, 0.143,
0.429, 0, 0), total_cases_per_million = c(1022.393, 80.751, 176.934,
736.937, 65.779, 37.424), new_cases_per_million = c(7.731, 0.913,
0, 3.827, 0.239, 0), new_cases_smoothed_per_million = c(8.487,
1.613, 0.354, 10.388, 0.157, 0.18), total_deaths_per_million = c(34.617,
3.286, 3.299, 2.551, 2.631, 0.084), new_deaths_per_million = c(0.182,
0.03, 0, 0, 0, 0), new_deaths_smoothed_per_million = c(0.202,
0.026, 0.012, 0.182, 0, 0), population = c(43851043, 32866268,
12123198, 2351625, 20903278, 11890781), population_density = c(17.348,
23.89, 99.11, 4.044, 70.151, 423.062), median_age = c(29.1, 16.8,
18.8, 25.8, 17.6, 17.5), aged_65_older = c(6.211, 2.405, 3.244,
3.941, 2.409, 2.562), aged_70_older = c(3.857, 1.362, 1.942,
2.242, 1.358, 1.504), gdp_per_capita = c(13913.839, 5819.495,
2064.236, 15807.374, 1703.102, 702.225), extreme_poverty = c(0.5,
NA, 49.6, NA, 43.7, 71.7), cardiovasc_death_rate = c(278.364,
276.045, 235.848, 237.372, 269.048, 293.068), diabetes_prevalence = c(6.73,
3.94, 0.99, 4.81, 2.42, 6.05), female_smokers = c(0.7, NA, 0.6,
5.7, 1.6, NA), male_smokers = c(30.4, NA, 12.3, 34.4, 23.9, NA
), handwashing_facilities = c(83.741, 26.664, 11.035, NA, 11.877,
6.144), hospital_beds_per_thousand = c(1.9, NA, 0.5, 1.8, 0.4,
0.8), life_expectancy = c(76.88, 61.15, 61.77, 69.59, 61.58,
61.58)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
fun1 <- function(cont, column)
{
countries<-df%>%
filter(continent == cont)
m<-mean(countries[[column]],na.rm=T)
df[,column]<-ifelse(is.na(df[,column]) & df$continent==cont,m,(df[,column]=df[,column]))
}
fun1("Europe","median_age")
错误: 总结期间出错:无法将大小为 208 的输入回收到大小 1。 错误:没有更多可用的错误处理程序(递归错误?);正在调用 'abort' 重新启动
你这里有很多问题。第一个是你似乎在复制你的输入时出错了,所以你的示例代码没有 运行。其次,你在函数中使用了名字mean
作为变量名,这很可能会导致后面的调试混乱。第三个是你的函数没有 return 任何东西。最后,您的间距使代码很难阅读。您有很多带有换行符的垂直空格,但不要用空格分隔变量名和运算符。这再次使事情更难调试。
如果您使用的是 dplyr 函数,则可以利用准引用使您的代码更简单、使用起来更直观。例如,您可以编写它来传递裸列名称,而不必将它们用“双引号”
括起来fun1 <- function(cont, col)
{
col <- enquo(col)
filter(df, continent == cont) %>%
mutate(!!col := replace(!!col, is.na(!!col), mean(!!col, na.rm = TRUE)))
}
所以你可以这样写:
fun1("Africa", new_cases)
#> location iso_code continent date total_cases new_cases new_cases_smoothed
#> 1 Algeria DZA Africa 2020-09-02 44833 339 372.143
#> 2 Angola AGO Africa 2020-09-02 2654 30 53.000
#> 3 Benin BEN Africa 2020-09-02 2145 0 4.286
#> 4 Botswana BWA Africa 2020-09-02 1733 9 24.429
#> 5 Burkina Faso BFA Africa 2020-09-02 1375 5 3.286
#> 6 Burundi BDI Africa 2020-09-02 445 0 2.143
#> total_deaths new_deaths
#> 1 1518 8
#> 2 108 1
#> 3 40 0
#> 4 6 0
#> 5 55 0
#> 6 1 0
如果您只想用该大陆其他国家/地区的平均值替换数字列中的所有 NA 值,那么您根本不需要函数。您可以只使用:
df <- df %>%
group_by(continent) %>%
mutate(across(total_cases:life_expectancy,
function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))))
转换整个数据框。