编写一个将向量作为输入的函数，丢弃不需要的值，删除重复项，并 returns 原始向量的相应索引

Question

我正在尝试编写一个接受向量并根据几个步骤对其进行子集化的函数：

丢弃任何不需要的值
删除重复项。
Return是考虑了步骤 (1) 和 (2) 后原始向量的 indexes。

例如，提供以下输入向量：

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")

和

throw_away_val <- "cat"

我想要我的函数 get_indexes(x = vec_animals, y = throw_away_val) 到 return:

# [1] 1 6   # `1` is the index of the 1st unique ("dog") in `vec_animals`, `6` is the index of the 2nd unique ("dolphin")

另一个例子

vec_years <- c(2003, 2003, 2003, 2007, 2007, 2011, 2011, 2011)
throw_away_val <- 2003

Return:

# [1] 4 6 # `4` is the position of 1st unique (`2007`) after throwing away unwanted val; `6` is the position of 2nd unique (`2011`).

我的初步尝试

以下函数 returns 索引但不考虑重复项

get_index <- function(x, throw_away) {
  which(x != throw_away)
}

然后 returns 原始 vec_animals 的索引，例如：

get_index(vec_animals, "cat")
#> [1] 1 2 3 4 6 7

如果我们将此输出用于子集 vec_animal，我们将得到：

vec_animals[get_index(vec_animals, "cat")]
#> [1] "dog"     "dog"     "dog"     "dog"     "dolphin" "dolphin"

您可能建议对此输出进行操作，例如：

vec_animals[get_index(vec_animals, "cat")] |> unique()
#> [1] "dog"     "dolphin"

但是不，我需要 get_index() 到 return 正确的索引（在本例中为 1 和 6）。

编辑

提供了获取第一次重复索引的相关程序

library(bit64)

vec_num <- as.integer64(c(4, 2, 2, 3, 3, 3, 3, 100, 100))
unipos(vec_num)
#> [1] 1 2 4 8

或更一般地说

which(!duplicated(vec_num))
#> [1] 1 2 4 8

如果不需要也丢弃不需要的值，这样的解决方案会很棒。

Answer 1

这是一个简单的自写函数，它提供了所需的信息。

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")

get_indexes <- function(x, throw_away){
  elements <- (unique(x))[(unique(x)) != throw_away]
  index <- lapply(1:length(elements), function(i) {which(x %in% elements[i]) })
  index2return <- c()
  for (j in 1:length(index)) {
    index2return <- c(index2return, min(index[[j]]))
  }
  return(index2return)
}

get_indexes(x = vec_animals, throw_away = "cat")
[1] 1 6

Answer 2

尝试：

get_index <- function(x, throw_away) {
  which(!duplicated(x) & x!=throw_away)
  }

> get_index(vec_animals, "cat")
[1] 1 6

Answer 3

我的做法：

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")
throw_away_val <- "cat"

my_function <- function(x, y) {
my_df <- data.frame("Origin" = x,
                  "Position" = seq.int(from = 1, to = length(x), by = 1),
                  stringsAsFactors = FALSE)
my_var <- which(my_df$Origin %in% y)
if (length(my_var)) {
my_df <- my_df[-my_var,]
}
my_df <- my_df[!duplicated(my_df$Origin),]
return (my_df)
}

my_df <- my_function(vec_animals, throw_away_val)

编写一个将向量作为输入的函数，丢弃不需要的值，删除重复项，并 returns 原始向量的相应索引

Writing a function that takes a vector as input, throws away unwanted values, de-duplicates, and returns respective indexes of original vector

r

function

vector

duplicates

我的初步尝试