R ddply行汇总统计
R ddply row summary statistics
对于下面数据框中的每一行(由 FID_Bounda
、NAME
、DESCRIPTIO
和 SOVEREIGNT
定义)我正在尝试计算平均值、标准值以 crN
开头的每列中所有值的偏差和变异系数。
structure(list(FID_Bounda = 0:7, NAME = c("Bedfordshire", "Berkshire",
"Bristol", "Buckinghamshire", "Cambridgeshire", "Cheshire", "Derbyshire",
"Devon"), DESCRIPTIO = c("Ceremonial County", "Ceremonial County",
"Ceremonial County", "Ceremonial County", "Ceremonial County",
"Ceremonial County", "Ceremonial County", "Ceremonial County"
), SOVEREIGNT = c("England", "England", "England", "England",
"England", "England", "England", "England"), crN1 = c(61.944107636,
38.769347117, 0.810167027, 63.721241962, 191.046323469, 81.467146994,
61.65529268, 288.751788714), crN10 = c(60.33595964, 38.326639788,
0.834289164, 63.009539538, 185.25772542, 82.936101454, 61.985178493,
304.951827268), crN100 = c(53.385110882, 33.530058107, 0.739041324,
55.601839364, 165.604271128, 76.386014559, 55.591194915, 284.739586188
), crN1000 = c(58.397452282, 37.277298648, 0.820739862, 61.716749153,
175.436497697, 82.461823706, 61.762203751, 321.414544333)), .Names = c("FID_Bounda",
"NAME", "DESCRIPTIO", "SOVEREIGNT", "crN1", "crN10", "crN100",
"crN1000"), row.names = c(NA, 8L), class = "data.frame")
我尝试使用 cookbook-r 中概述的代码来导出这些值:
cdata <- ddply(uadt, c("FID_Bounda","NAME","DESCRIPTIO","SOVEREIGNT"), summarise,
N = length(grep("crN", names(uadt), value = T)),
mean = mean(grep("crN", names(uadt), value = F)),
sd = sd(grep("crN", names(uadt), value = F)),
se = sd / sqrt(N)
)
cdata
它正确地计算了 crN
列的总数,但它为每一行提供了相同的均值、sd 和 se。对于问题所在的任何帮助将不胜感激,因为真实数据集有 1000 列,所有列都具有相同的命名模式 crNnumber
.
我知道这不是完美的答案,但可能值得使用更多最新的工具(同样我知道这句话的讽刺意味,因为我的答案没有使用 tidyr
)。但我会采取的方法是:
library(reshape2)
madt <- melt(uadt,
id.vars = c("FID_Bounda", "NAME",
"DESCRIPTIO", "SOVEREIGNT"))
library(dplyr)
cdata <- summarise(group_by(madt,
FID_Bounda, NAME,
DESCRIPTIO, SOVEREIGNT),
N = n_distinct(variable),
mean = mean(value),
sd = sd(value),
se = sd / sqrt(N))
这确实产生了正确的输出
食谱中的示例正在计算均值,其他函数沿列向下而不是跨行,这就是您想要的。
使用基数 R 实现此目的的一种方法是:
functions <- list(length, mean, sd)
d <- lapply(functions, function(y) {
apply(uadt, 1, function(x) y(as.numeric(x[5:8])))
})
calc <- as.data.frame(do.call(cbind, d))
names(calc) <- c("N", "mean", "sd")
cdata <- cbind(uadt[1:4], calc)
cdata$se <- cdata$sd / sqrt(cdata$N)
如果您有更多数字列,只需适当更改间隔 5:8。
对于下面数据框中的每一行(由 FID_Bounda
、NAME
、DESCRIPTIO
和 SOVEREIGNT
定义)我正在尝试计算平均值、标准值以 crN
开头的每列中所有值的偏差和变异系数。
structure(list(FID_Bounda = 0:7, NAME = c("Bedfordshire", "Berkshire",
"Bristol", "Buckinghamshire", "Cambridgeshire", "Cheshire", "Derbyshire",
"Devon"), DESCRIPTIO = c("Ceremonial County", "Ceremonial County",
"Ceremonial County", "Ceremonial County", "Ceremonial County",
"Ceremonial County", "Ceremonial County", "Ceremonial County"
), SOVEREIGNT = c("England", "England", "England", "England",
"England", "England", "England", "England"), crN1 = c(61.944107636,
38.769347117, 0.810167027, 63.721241962, 191.046323469, 81.467146994,
61.65529268, 288.751788714), crN10 = c(60.33595964, 38.326639788,
0.834289164, 63.009539538, 185.25772542, 82.936101454, 61.985178493,
304.951827268), crN100 = c(53.385110882, 33.530058107, 0.739041324,
55.601839364, 165.604271128, 76.386014559, 55.591194915, 284.739586188
), crN1000 = c(58.397452282, 37.277298648, 0.820739862, 61.716749153,
175.436497697, 82.461823706, 61.762203751, 321.414544333)), .Names = c("FID_Bounda",
"NAME", "DESCRIPTIO", "SOVEREIGNT", "crN1", "crN10", "crN100",
"crN1000"), row.names = c(NA, 8L), class = "data.frame")
我尝试使用 cookbook-r 中概述的代码来导出这些值:
cdata <- ddply(uadt, c("FID_Bounda","NAME","DESCRIPTIO","SOVEREIGNT"), summarise,
N = length(grep("crN", names(uadt), value = T)),
mean = mean(grep("crN", names(uadt), value = F)),
sd = sd(grep("crN", names(uadt), value = F)),
se = sd / sqrt(N)
)
cdata
它正确地计算了 crN
列的总数,但它为每一行提供了相同的均值、sd 和 se。对于问题所在的任何帮助将不胜感激,因为真实数据集有 1000 列,所有列都具有相同的命名模式 crNnumber
.
我知道这不是完美的答案,但可能值得使用更多最新的工具(同样我知道这句话的讽刺意味,因为我的答案没有使用 tidyr
)。但我会采取的方法是:
library(reshape2)
madt <- melt(uadt,
id.vars = c("FID_Bounda", "NAME",
"DESCRIPTIO", "SOVEREIGNT"))
library(dplyr)
cdata <- summarise(group_by(madt,
FID_Bounda, NAME,
DESCRIPTIO, SOVEREIGNT),
N = n_distinct(variable),
mean = mean(value),
sd = sd(value),
se = sd / sqrt(N))
这确实产生了正确的输出
食谱中的示例正在计算均值,其他函数沿列向下而不是跨行,这就是您想要的。
使用基数 R 实现此目的的一种方法是:
functions <- list(length, mean, sd)
d <- lapply(functions, function(y) {
apply(uadt, 1, function(x) y(as.numeric(x[5:8])))
})
calc <- as.data.frame(do.call(cbind, d))
names(calc) <- c("N", "mean", "sd")
cdata <- cbind(uadt[1:4], calc)
cdata$se <- cdata$sd / sqrt(cdata$N)
如果您有更多数字列,只需适当更改间隔 5:8。