R:重复值直到新值按组出现,只有第一个非 NA 值出现
R: Repeat value until new value appears by group, only once first non-NA value appears
我希望重复值,直到按组出现新值。我有一个前段时间在网上找到的功能,几乎可以满足我的需求,但不完全是。这是该函数:
repeat.before <- function(x) {
ind <- which(!is.na(x))
ind_rep <- ind
if (is.na(x[1])) {
ind_rep <- c(min(ind), ind)
ind <- c(1, ind)
}
rep(x[ind_rep], times = diff(c(ind, length(x) + 1)))
}
此函数将按组成功重复值,直到出现新值。问题是,如果该列以 NA 开头,则在第一个值之前存在的后续行将最终采用第一个值,而不是保留为 NA。我将用这个例子来说明我的意思:
group location
A NA
A NA
A New York
A NA
A NA
B Chicago
B NA
B Philly
B NA
上面的代码将输出:
group location
A New York
A New York
A New York
A New York
A New York
B Chicago
B Chicago
B Philly
B Philly
同样,这非常接近我正在寻找的东西,但又不完全是。这是我正在寻找的输出:
group location
A NA
A NA
A New York
A New York
A New York
B Chicago
B Chicago
B Philly
B Philly
基本上,我不希望 "repeat" 代码在找到第一个值之前开始工作。在此之前,我希望行保持 NA。目的是为了避免对行进行错误分类,即在上面的示例中,前两个 A 行不应标记为纽约。
一个选项是 fill
按 'group' 分组后。将 fill
与指定为 'up' 或 'down'(默认选项)的 .direction
一起使用。在这里,我们只需要 'down' 基于预期输出的选项
library(dplyr)
library(tidyr)
df1 %>%
group_by(group) %>%
fill(location)
# A tibble: 9 x 2
# Groups: group [2]
# group location
# <chr> <chr>
#1 A <NA>
#2 A <NA>
#3 A New York
#4 A New York
#5 A New York
#6 B Chicago
#7 B Chicago
#8 B Philly
#9 B Philly
数据
df1 <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"), location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
"Philly", NA)), class = "data.frame", row.names = c(NA, -9L))
您还可以通过 na.locf
函数使用 zoo
包。
library(zoo)
df1 <-
structure(list(
group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"),
location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
"Philly", NA)
),
class = "data.frame",
row.names = c(NA,-9L))
df1$location2 <- na.locf(df1$location, na.rm = F)
df1
group location location2
1 A <NA> <NA>
2 A <NA> <NA>
3 A New York New York
4 A <NA> New York
5 A <NA> New York
6 B Chicago Chicago
7 B <NA> Chicago
8 B Philly Philly
9 B <NA> Philly
基础 R
transform(df1,
loc2 = ave(df1$location,
cumsum(!is.na(df1$location)),
FUN = function(x) x[1]))
# group location loc2
#1 A <NA> <NA>
#2 A <NA> <NA>
#3 A New York New York
#4 A <NA> New York
#5 A <NA> New York
#6 B Chicago Chicago
#7 B <NA> Chicago
#8 B Philly Philly
#9 B <NA> Philly
我希望重复值,直到按组出现新值。我有一个前段时间在网上找到的功能,几乎可以满足我的需求,但不完全是。这是该函数:
repeat.before <- function(x) {
ind <- which(!is.na(x))
ind_rep <- ind
if (is.na(x[1])) {
ind_rep <- c(min(ind), ind)
ind <- c(1, ind)
}
rep(x[ind_rep], times = diff(c(ind, length(x) + 1)))
}
此函数将按组成功重复值,直到出现新值。问题是,如果该列以 NA 开头,则在第一个值之前存在的后续行将最终采用第一个值,而不是保留为 NA。我将用这个例子来说明我的意思:
group location
A NA
A NA
A New York
A NA
A NA
B Chicago
B NA
B Philly
B NA
上面的代码将输出:
group location
A New York
A New York
A New York
A New York
A New York
B Chicago
B Chicago
B Philly
B Philly
同样,这非常接近我正在寻找的东西,但又不完全是。这是我正在寻找的输出:
group location
A NA
A NA
A New York
A New York
A New York
B Chicago
B Chicago
B Philly
B Philly
基本上,我不希望 "repeat" 代码在找到第一个值之前开始工作。在此之前,我希望行保持 NA。目的是为了避免对行进行错误分类,即在上面的示例中,前两个 A 行不应标记为纽约。
一个选项是 fill
按 'group' 分组后。将 fill
与指定为 'up' 或 'down'(默认选项)的 .direction
一起使用。在这里,我们只需要 'down' 基于预期输出的选项
library(dplyr)
library(tidyr)
df1 %>%
group_by(group) %>%
fill(location)
# A tibble: 9 x 2
# Groups: group [2]
# group location
# <chr> <chr>
#1 A <NA>
#2 A <NA>
#3 A New York
#4 A New York
#5 A New York
#6 B Chicago
#7 B Chicago
#8 B Philly
#9 B Philly
数据
df1 <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"), location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
"Philly", NA)), class = "data.frame", row.names = c(NA, -9L))
您还可以通过 na.locf
函数使用 zoo
包。
library(zoo)
df1 <-
structure(list(
group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"),
location = c(NA, NA, "New York", NA, NA, "Chicago", NA,
"Philly", NA)
),
class = "data.frame",
row.names = c(NA,-9L))
df1$location2 <- na.locf(df1$location, na.rm = F)
df1
group location location2
1 A <NA> <NA>
2 A <NA> <NA>
3 A New York New York
4 A <NA> New York
5 A <NA> New York
6 B Chicago Chicago
7 B <NA> Chicago
8 B Philly Philly
9 B <NA> Philly
基础 R
transform(df1,
loc2 = ave(df1$location,
cumsum(!is.na(df1$location)),
FUN = function(x) x[1]))
# group location loc2
#1 A <NA> <NA>
#2 A <NA> <NA>
#3 A New York New York
#4 A <NA> New York
#5 A <NA> New York
#6 B Chicago Chicago
#7 B <NA> Chicago
#8 B Philly Philly
#9 B <NA> Philly