替换缺失值
Replace missing values
M Price Quantity Quantity1
---------------------------------
2014m1 55 150 150
2014m2 55 220 220
2014m3 55 350 87,5
2014m4 55 NA 87,5
2014m5 55 NA 87,5
2014m6 55 NA 87,5
2014m8 58 200 200
这是我的 table 样本。即我想要得到像 Quantity1 这样的结果。这意味着如果某个值是 NA,代码应该除以 NA 的数字加 1。
例如,350 应替换为 87,5(=350/4),接下来的三个值也应替换为 87,5。
那么有人可以帮我处理这段带循环的代码吗?
我认为,以下代码适合您:
getValueindices<-function(dt){which( is.na(dt))-1 } #find replace candidate
setValue<-function(indices,dt ){ # replace Na with previous value
for(i in indices)
if(min(indices)==i)
dt[i+1]<-dt[i]/(sum(is.na(dt))+1)
else
dt[i+1]<-dt[i]
dt
}
getValueindices(df$Quantity)
setValue(indices,df$Quantity)
df$Quantity1<- setValue(indices,df$Quantity)
df
输出为:
M Price Quantity Quantity1
1 2014m1 55 150 150.0
2 2014m2 55 220 220.0
3 2014m3 55 350 350.0
4 2014m4 55 NA 87.5
5 2014m5 55 NA 87.5
6 2014m6 55 NA 87.5
7 2014m8 58 200 200.0
使用 Base R,我们可以使用 ave
:
df$Quantity1 = ave(df$Quantity, cumsum(!is.na(df$Quantity)),
FUN = function(x) max(x, na.rm = TRUE)/length(x))
另外,data.table
(感谢@Jaap):
library(data.table)
setDT(df)[, Quantity1 := max(Quantity, na.rm = TRUE)/.N, by = cumsum(!is.na(Quantity))]
输出:
M Price Quantity Quantity1
1 2014m1 55 150 150.0
2 2014m2 55 220 220.0
3 2014m3 55 350 87.5
4 2014m4 55 NA 87.5
5 2014m5 55 NA 87.5
6 2014m6 55 NA 87.5
7 2014m8 58 200 200.0
或 dplyr
:
library(dplyr)
df %>%
group_by(na_id = cumsum(!is.na(Quantity))) %>%
mutate(Quantity1 = max(Quantity, na.rm = TRUE)/n())
注意:我们可以添加 ungroup() %>% select(-na_id)
来删除 na_id
列。
输出:
# A tibble: 7 x 5
# Groups: na_id [4]
M Price Quantity na_id Quantity1
<fct> <int> <int> <int> <dbl>
1 2014m1 55 150 1 150
2 2014m2 55 220 2 220
3 2014m3 55 350 3 87.5
4 2014m4 55 NA 3 87.5
5 2014m5 55 NA 3 87.5
6 2014m6 55 NA 3 87.5
7 2014m8 58 200 4 200
数据:
df <- structure(list(M = structure(1:7, .Label = c("2014m1", "2014m2",
"2014m3", "2014m4", "2014m5", "2014m6", "2014m8"), class = "factor"),
Price = c(55L, 55L, 55L, 55L, 55L, 55L, 58L), Quantity = c(150L,
220L, 350L, NA, NA, NA, 200L)), class = "data.frame", row.names = c(NA,
-7L), .Names = c("M", "Price", "Quantity"))
M Price Quantity Quantity1
---------------------------------
2014m1 55 150 150
2014m2 55 220 220
2014m3 55 350 87,5
2014m4 55 NA 87,5
2014m5 55 NA 87,5
2014m6 55 NA 87,5
2014m8 58 200 200
这是我的 table 样本。即我想要得到像 Quantity1 这样的结果。这意味着如果某个值是 NA,代码应该除以 NA 的数字加 1。
例如,350 应替换为 87,5(=350/4),接下来的三个值也应替换为 87,5。
那么有人可以帮我处理这段带循环的代码吗?
我认为,以下代码适合您:
getValueindices<-function(dt){which( is.na(dt))-1 } #find replace candidate
setValue<-function(indices,dt ){ # replace Na with previous value
for(i in indices)
if(min(indices)==i)
dt[i+1]<-dt[i]/(sum(is.na(dt))+1)
else
dt[i+1]<-dt[i]
dt
}
getValueindices(df$Quantity)
setValue(indices,df$Quantity)
df$Quantity1<- setValue(indices,df$Quantity)
df
输出为:
M Price Quantity Quantity1
1 2014m1 55 150 150.0
2 2014m2 55 220 220.0
3 2014m3 55 350 350.0
4 2014m4 55 NA 87.5
5 2014m5 55 NA 87.5
6 2014m6 55 NA 87.5
7 2014m8 58 200 200.0
使用 Base R,我们可以使用 ave
:
df$Quantity1 = ave(df$Quantity, cumsum(!is.na(df$Quantity)),
FUN = function(x) max(x, na.rm = TRUE)/length(x))
另外,data.table
(感谢@Jaap):
library(data.table)
setDT(df)[, Quantity1 := max(Quantity, na.rm = TRUE)/.N, by = cumsum(!is.na(Quantity))]
输出:
M Price Quantity Quantity1
1 2014m1 55 150 150.0
2 2014m2 55 220 220.0
3 2014m3 55 350 87.5
4 2014m4 55 NA 87.5
5 2014m5 55 NA 87.5
6 2014m6 55 NA 87.5
7 2014m8 58 200 200.0
或 dplyr
:
library(dplyr)
df %>%
group_by(na_id = cumsum(!is.na(Quantity))) %>%
mutate(Quantity1 = max(Quantity, na.rm = TRUE)/n())
注意:我们可以添加 ungroup() %>% select(-na_id)
来删除 na_id
列。
输出:
# A tibble: 7 x 5
# Groups: na_id [4]
M Price Quantity na_id Quantity1
<fct> <int> <int> <int> <dbl>
1 2014m1 55 150 1 150
2 2014m2 55 220 2 220
3 2014m3 55 350 3 87.5
4 2014m4 55 NA 3 87.5
5 2014m5 55 NA 3 87.5
6 2014m6 55 NA 3 87.5
7 2014m8 58 200 4 200
数据:
df <- structure(list(M = structure(1:7, .Label = c("2014m1", "2014m2",
"2014m3", "2014m4", "2014m5", "2014m6", "2014m8"), class = "factor"),
Price = c(55L, 55L, 55L, 55L, 55L, 55L, 58L), Quantity = c(150L,
220L, 350L, NA, NA, NA, 200L)), class = "data.frame", row.names = c(NA,
-7L), .Names = c("M", "Price", "Quantity"))