如何在 R 中一遍又一遍地 运行 欧氏距离
How to run euclidean distance over and over in R
如果我在 R 中有这样的文件
dput(filename)
structure(list(word = structure(c(2L, 1L), .Label = c("frq",
"ocr_avg"), class = "factor"), abeja = c(98, 24), abeja.1 = c(26.666,
3), abrigo = c(53.333, 6), abrigo.1 = c(50, 1), abrigo.2 = c(83.809,
21), abrigo.3 = c(31.666, 6)), .Names = c("word", "abeja", "abeja.1",
"abrigo", "abrigo.1", "abrigo.2", "abrigo.3"), row.names = c(NA,
-2L), class = "data.frame")
# word abeja abeja.1 abrigo abrigo.1 abrigo.2 abrigo.3
# 1 ocr_avg 98 26.666 53.333 50 83.809 31.666
# 2 frq 24 3.000 6.000 1 21.000 6.000
我想计算同名对之间的欧氏距离,例如在 (abeja & abeja.1) 之间,然后在 (abrigo & abrigo.1) 和 (abrigo & abrigo.2) 之间,以及(abrigo & abrigo.3)。但也在 (abrigo.1 & abrigo.2) 和 (abrigo.2 & abrigo.3) 之间。
有没有一种方法可以自动执行此操作,这样我就不必遍历每一对并在 R 中自己完成(这是一个相当大的文件)?
我自己的做法是这样的:
x <- filename$abeja
y <- filename$abeja.1
dist(rbind(x,y))
mystring <- names(filename)
library(stringr)
# take the common patterns
strUniq <- unique(ifelse(str_detect(mystring, '\.'),
str_sub(mystring, 1, str_locate(mystring, '\.')[,1] -1),
mystring))
strUniq
# [1] "word" "abeja" "abrigo"
library(dplyr)
outp <- lapply(strUniq, function(x) select(filename, starts_with(x)))
outp
# [[1]]
# word
# 1 ocr_avg
# 2 frq
#
# [[2]]
# abeja abeja.1
# 1 98 26.666
# 2 24 3.000
#
# [[3]]
# abrigo abrigo.1 abrigo.2 abrigo.3
# 1 53.333 50 83.809 31.666
# 2 6.000 1 21.000 6.000
lapply(outp, function(x) dist(t(x)))
# [[1]]
# dist(0)
#
# [[2]]
# abeja
# abeja.1 74.36087
#
# [[3]]
# abrigo abrigo.1 abrigo.2
# abrigo.1 6.009067
# abrigo.2 33.967434 39.281656
# abrigo.3 21.667000 19.003567 54.257649
#
# Warning message:
# In dist(t(x)) : NAs introduced by coercion
警告是由于 "word" 不包含数字。您可以先将其删除以避免警告。
如果我在 R 中有这样的文件
dput(filename)
structure(list(word = structure(c(2L, 1L), .Label = c("frq",
"ocr_avg"), class = "factor"), abeja = c(98, 24), abeja.1 = c(26.666,
3), abrigo = c(53.333, 6), abrigo.1 = c(50, 1), abrigo.2 = c(83.809,
21), abrigo.3 = c(31.666, 6)), .Names = c("word", "abeja", "abeja.1",
"abrigo", "abrigo.1", "abrigo.2", "abrigo.3"), row.names = c(NA,
-2L), class = "data.frame")
# word abeja abeja.1 abrigo abrigo.1 abrigo.2 abrigo.3
# 1 ocr_avg 98 26.666 53.333 50 83.809 31.666
# 2 frq 24 3.000 6.000 1 21.000 6.000
我想计算同名对之间的欧氏距离,例如在 (abeja & abeja.1) 之间,然后在 (abrigo & abrigo.1) 和 (abrigo & abrigo.2) 之间,以及(abrigo & abrigo.3)。但也在 (abrigo.1 & abrigo.2) 和 (abrigo.2 & abrigo.3) 之间。
有没有一种方法可以自动执行此操作,这样我就不必遍历每一对并在 R 中自己完成(这是一个相当大的文件)?
我自己的做法是这样的:
x <- filename$abeja
y <- filename$abeja.1
dist(rbind(x,y))
mystring <- names(filename)
library(stringr)
# take the common patterns
strUniq <- unique(ifelse(str_detect(mystring, '\.'),
str_sub(mystring, 1, str_locate(mystring, '\.')[,1] -1),
mystring))
strUniq
# [1] "word" "abeja" "abrigo"
library(dplyr)
outp <- lapply(strUniq, function(x) select(filename, starts_with(x)))
outp
# [[1]]
# word
# 1 ocr_avg
# 2 frq
#
# [[2]]
# abeja abeja.1
# 1 98 26.666
# 2 24 3.000
#
# [[3]]
# abrigo abrigo.1 abrigo.2 abrigo.3
# 1 53.333 50 83.809 31.666
# 2 6.000 1 21.000 6.000
lapply(outp, function(x) dist(t(x)))
# [[1]]
# dist(0)
#
# [[2]]
# abeja
# abeja.1 74.36087
#
# [[3]]
# abrigo abrigo.1 abrigo.2
# abrigo.1 6.009067
# abrigo.2 33.967434 39.281656
# abrigo.3 21.667000 19.003567 54.257649
#
# Warning message:
# In dist(t(x)) : NAs introduced by coercion
警告是由于 "word" 不包含数字。您可以先将其删除以避免警告。