在使用 unique() 函数后,如何从具有重复案例的 data.frame 中获取行的索引?
How can I get the index of the rows, from a data.frame with repeated cases, after using the unique() function?
我正在清理我的数据库 'Visitas' 因为它是由每周一次来医院的病人组成的,所以我有相同的主题重复了很多次,而我只是想考虑一下病人一次。
我使用 unique()
函数获取了 'real' 患者数量,但现在我无法从原始数据库中获取它。
我试图用这些案例创建一个向量,然后使用 which() 来获取索引,但它不起作用。
我在下面留下一些代码。
# Visitas_act: active patients who are still going to the hospital
# Visitas_mod: the initial 'Visitas' database but almost cleaned.
# codeep: patient code, identifier
Visitas_mod <- Visitas_mod[Visitas_act, ]
unique(Visitas_mod[, 'codeep'])
Visitas_r <- unique(Visitas_mod[, 'codeep'])
我试过了,但它不起作用,因为 'Visitas_mod' 数据库中的索引不匹配
tut <- which(Visitas_mod[, 'codeep'] == Visitas_r)
Visitas_mod <- Visitas_mod[tut, ]
我不完全理解你的问题,但我最好的猜测是这样的,如果你有任何问题或我弄错了,请评论:)
#create dummy names
names <- cbind(row.names(mtcars),c(1:32))
visit <- sample(names[,1],400,replace = T)
df <- as.data.frame(x = visit)
id <- names[match(df$visit,names),2]
df <- cbind(df,id)
#get vector of unique visitors
uniqV <- unique(df$visit)
#this returns the index of first occurence of a visitor in the dataframe
match(uniqV,df$visit)
输出:
1 2 3 5 6 7 8 9 11 12 13 14 15 20 21 23 24 25 28 30 32 37 46
47 53 58 64 68 85 90 107 120
所有位置都可以用代码的这个后续部分完成
#return all position of unique visitor in dataset
test <- function(x) {
which(df$visit %in% x)
}
matches <- sapply(uniqV,test)
out <- cbind(as.character(uniqV), unlist(lapply(matches, paste, collapse =",")))
输出:
1 Ford Pantera L 1,98,116,127,128,136,142,189,208,210,217,254,273,275,277,298,313,360
2 Hornet Sportabout 2,19,29,78,81,112,144,234,294,303,322,332,387
3 Merc 280C 3,4,39,159,173,188,308,309,312,351,389
4 Merc 230 5,18,48,77,106,150,161,211,219,240,241,270,299,306,330,365,383
5 Dodge Challenger 6,74,154,172,222,268,336,352,367
6 Valiant 7,26,207,235,258,265,392,397
7 Porsche 914-2 8,45,102,113,117,119,121,131,149,151,224,288,329,343,350,362,394
8 Fiat X1-9 9,10,65,95,135,170,271,282,285,346,380
我正在清理我的数据库 'Visitas' 因为它是由每周一次来医院的病人组成的,所以我有相同的主题重复了很多次,而我只是想考虑一下病人一次。
我使用 unique()
函数获取了 'real' 患者数量,但现在我无法从原始数据库中获取它。
我试图用这些案例创建一个向量,然后使用 which() 来获取索引,但它不起作用。
我在下面留下一些代码。
# Visitas_act: active patients who are still going to the hospital
# Visitas_mod: the initial 'Visitas' database but almost cleaned.
# codeep: patient code, identifier
Visitas_mod <- Visitas_mod[Visitas_act, ]
unique(Visitas_mod[, 'codeep'])
Visitas_r <- unique(Visitas_mod[, 'codeep'])
我试过了,但它不起作用,因为 'Visitas_mod' 数据库中的索引不匹配
tut <- which(Visitas_mod[, 'codeep'] == Visitas_r)
Visitas_mod <- Visitas_mod[tut, ]
我不完全理解你的问题,但我最好的猜测是这样的,如果你有任何问题或我弄错了,请评论:)
#create dummy names
names <- cbind(row.names(mtcars),c(1:32))
visit <- sample(names[,1],400,replace = T)
df <- as.data.frame(x = visit)
id <- names[match(df$visit,names),2]
df <- cbind(df,id)
#get vector of unique visitors
uniqV <- unique(df$visit)
#this returns the index of first occurence of a visitor in the dataframe
match(uniqV,df$visit)
输出:
1 2 3 5 6 7 8 9 11 12 13 14 15 20 21 23 24 25 28 30 32 37 46
47 53 58 64 68 85 90 107 120
所有位置都可以用代码的这个后续部分完成
#return all position of unique visitor in dataset
test <- function(x) {
which(df$visit %in% x)
}
matches <- sapply(uniqV,test)
out <- cbind(as.character(uniqV), unlist(lapply(matches, paste, collapse =",")))
输出:
1 Ford Pantera L 1,98,116,127,128,136,142,189,208,210,217,254,273,275,277,298,313,360
2 Hornet Sportabout 2,19,29,78,81,112,144,234,294,303,322,332,387
3 Merc 280C 3,4,39,159,173,188,308,309,312,351,389
4 Merc 230 5,18,48,77,106,150,161,211,219,240,241,270,299,306,330,365,383
5 Dodge Challenger 6,74,154,172,222,268,336,352,367
6 Valiant 7,26,207,235,258,265,392,397
7 Porsche 914-2 8,45,102,113,117,119,121,131,149,151,224,288,329,343,350,362,394
8 Fiat X1-9 9,10,65,95,135,170,271,282,285,346,380