在 R 中的数据框中存储双精度数
Store a double within a data frame in R
我有一个数据框如下:
fitnorm <- data.frame(dataset=0,mean=0,sd=0,normopl=0)
经过以下迭代过程
normdat <- rnorm(25, mean = 30, sd = sqrt(9))
fitnorm[i,1] <- normdat
fitnorm[i,2] <- mean(normdat)
fitnorm[i,3] <- sd(normdat)
fitnorm[i,4] <- qnorm(1-(400/1000), mean=fitnorm[i,2], sd=fitnorm[i,3])
但是,我遇到了这个错误。
Error in `[<-.data.frame`(`*tmp*`, 1, 1, value = c(27.431413650154,
30.3657212588031, : replacement has 25 rows, data has 1
我知道这是因为我试图将类型为 double 且大小为 25 的 'normdat' 放入数据框的单个元素中。数据框不应该能够容纳双精度类型的对象吗?我做错了什么?
Shouldn't a dataframe be able to hold an object of type double? What am I doing wrong?
可以。但这不是你在做什么
在行
## I've assumed i <- 1
fitnorm[i,1] <- normdat
您正在尝试将数值向量 (normdat
) 分配给数据框的一个单元格(即行 i
,第 1 列)
你是不是想做
fitnorm[i,1] <- normdat[i]
更新
根据您的评论,您不能将矢量存储在 data.frame 的单个项目中,您需要使用列表:
lst <- list(dataset = normdat,
mean = mean(normdat),
sd = sd(normdat),
normopl = qnorm(1-(400/1000), mean=fitnorm[i,2], sd=fitnorm[i,3]))
## Which gives
lst
$dataset
[1] 33.43470 28.66693 29.41060 32.95761 32.66531 29.86056 31.61961 29.32424 28.07063 31.80155
[11] 32.88489 31.90562 31.81625 24.62625 31.19141 27.41913 31.43993 29.60108 29.73310 23.77482
[21] 28.50347 27.22960 24.65698 27.13001 35.85981
$mean
[1] 29.82336
$sd
[1] 2.981638
$normopl
[1] 30.57875
如果您确实想要使用data.frame
,您必须使每一列的长度相同
fitnorm <- data.frame(dataset = normdat,
mean = mean(normdat),
sd = sd(normdat),
normopl = qnorm(1-(400/1000), mean=fitnorm[i,2], sd=fitnorm[i,3]))
head(fitnorm)
# dataset mean sd normopl
#1 33.43470 29.82336 2.981638 30.57875
#2 28.66693 29.82336 2.981638 30.57875
#3 29.41060 29.82336 2.981638 30.57875
#4 32.95761 29.82336 2.981638 30.57875
#5 32.66531 29.82336 2.981638 30.57875
#6 29.86056 29.82336 2.981638 30.57875
OP编辑
以上代码有效。但是,由于列表必须是迭代的,所以我做了一些改动。
fitnorm <- list(dataset=list(),mean=list(),sd=list(),normopl=list())
for (i in 1:5000){
normdat <- rnorm(25, mean = 30, sd = sqrt(9))
fitnorm$dataset[[i]] <- normdat
fitnorm$mean[[i]]<- mean(normdat)
fitnorm$sd[[i]] <- sd(normdat)
fitnorm$normopl[[i]] <- qnorm(1-(400/1000), mean=fitnorm$mean[[i]], sd=fitnorm$sd[[i]])
}
fitnorm$dataset[1]
[[1]]
[1] 33.43470 28.66693 29.41060 32.95761 32.66531 29.86056 31.61961 29.32424 28.07063 31.80155
[11] 32.88489 31.90562 31.81625 24.62625 31.19141 27.41913 31.43993 29.60108 29.73310 23.77482
[21] 28.50347 27.22960 24.65698 27.13001 35.85981
fitnorm$mean[1]
[[1]]
[1] 29.82336
fitnorm$sd[1]
[[1]]
[1] 2.981638
fitnorm$normopl[1]
[[1]]
[1] 30.57875
更新 - Symbolix
R
中有一个 'rule of thumb',我尝试坚持使用 lapply
而不是 for
,因为它是 通常 效率更高(它在 C
中有效)——关于这个已经有很多讨论了。
因此,我会将您的 for
循环替换为
lst <- lapply(1:5000, function(x){
normdat <- rnorm(25, mean = 30, sd = sqrt(9))
list(fitnorm = list(dataset = normdat,
mean = mean(normdat),
sd = sd(normdat),
normopl = qnorm(1-(400/1000), mean = mean(normdat), sd = sd(normdat))
))
})
还有一点基准测试:
Unit: milliseconds
expr min lq mean median uq max neval
fun_lapply() 220.2830 236.1661 252.7315 249.1904 267.1123 337.0799 100
fun_for_loop() 373.5972 399.8972 427.1629 421.7407 442.4626 593.7227 100
最终,此示例中的收益很小,但值得牢记。
更新 - Symbolix 2
如果您喜欢使用它们,您也可以创建一个 data.frame
:
这里我使用 data.table
包是为了它提供的速度
library(data.table)
lst <- lapply(1:5000, function(x){
normdat <- rnorm(25, mean = 30, sd = sqrt(9))
data.table(id = x,
dataset = normdat,
mean = mean(normdat),
sd = sd(normdat),
normopl = qnorm(1-(400/1000), mean=mean(normdat), sd=sd(normdat)))
})
##lst is now a list of data.tables, so we can 'rbind' them together
dt <- rbindlist(lst)
## now we have one data.table, and the 'id' column indicates
## which dataset each row belongs too
dt
# id dataset mean sd normopl
# 1: 1 24.09486 29.46829 3.261638 30.29462
# 2: 1 26.30732 29.46829 3.261638 30.29462
# 3: 1 31.42603 29.46829 3.261638 30.29462
# 4: 1 29.69081 29.46829 3.261638 30.29462
# 5: 1 30.01235 29.46829 3.261638 30.29462
# ---
# 124996: 5000 28.13584 30.39716 2.591752 31.05377
# 124997: 5000 27.44665 30.39716 2.591752 31.05377
# 124998: 5000 29.79728 30.39716 2.591752 31.05377
# 124999: 5000 28.73398 30.39716 2.591752 31.05377
# 125000: 5000 27.83779 30.39716 2.591752 31.05377
我有一个数据框如下:
fitnorm <- data.frame(dataset=0,mean=0,sd=0,normopl=0)
经过以下迭代过程
normdat <- rnorm(25, mean = 30, sd = sqrt(9))
fitnorm[i,1] <- normdat
fitnorm[i,2] <- mean(normdat)
fitnorm[i,3] <- sd(normdat)
fitnorm[i,4] <- qnorm(1-(400/1000), mean=fitnorm[i,2], sd=fitnorm[i,3])
但是,我遇到了这个错误。
Error in `[<-.data.frame`(`*tmp*`, 1, 1, value = c(27.431413650154,
30.3657212588031, : replacement has 25 rows, data has 1
我知道这是因为我试图将类型为 double 且大小为 25 的 'normdat' 放入数据框的单个元素中。数据框不应该能够容纳双精度类型的对象吗?我做错了什么?
Shouldn't a dataframe be able to hold an object of type double? What am I doing wrong?
可以。但这不是你在做什么
在行
## I've assumed i <- 1
fitnorm[i,1] <- normdat
您正在尝试将数值向量 (normdat
) 分配给数据框的一个单元格(即行 i
,第 1 列)
你是不是想做
fitnorm[i,1] <- normdat[i]
更新
根据您的评论,您不能将矢量存储在 data.frame 的单个项目中,您需要使用列表:
lst <- list(dataset = normdat,
mean = mean(normdat),
sd = sd(normdat),
normopl = qnorm(1-(400/1000), mean=fitnorm[i,2], sd=fitnorm[i,3]))
## Which gives
lst
$dataset
[1] 33.43470 28.66693 29.41060 32.95761 32.66531 29.86056 31.61961 29.32424 28.07063 31.80155
[11] 32.88489 31.90562 31.81625 24.62625 31.19141 27.41913 31.43993 29.60108 29.73310 23.77482
[21] 28.50347 27.22960 24.65698 27.13001 35.85981
$mean
[1] 29.82336
$sd
[1] 2.981638
$normopl
[1] 30.57875
如果您确实想要使用data.frame
,您必须使每一列的长度相同
fitnorm <- data.frame(dataset = normdat,
mean = mean(normdat),
sd = sd(normdat),
normopl = qnorm(1-(400/1000), mean=fitnorm[i,2], sd=fitnorm[i,3]))
head(fitnorm)
# dataset mean sd normopl
#1 33.43470 29.82336 2.981638 30.57875
#2 28.66693 29.82336 2.981638 30.57875
#3 29.41060 29.82336 2.981638 30.57875
#4 32.95761 29.82336 2.981638 30.57875
#5 32.66531 29.82336 2.981638 30.57875
#6 29.86056 29.82336 2.981638 30.57875
OP编辑
以上代码有效。但是,由于列表必须是迭代的,所以我做了一些改动。
fitnorm <- list(dataset=list(),mean=list(),sd=list(),normopl=list())
for (i in 1:5000){
normdat <- rnorm(25, mean = 30, sd = sqrt(9))
fitnorm$dataset[[i]] <- normdat
fitnorm$mean[[i]]<- mean(normdat)
fitnorm$sd[[i]] <- sd(normdat)
fitnorm$normopl[[i]] <- qnorm(1-(400/1000), mean=fitnorm$mean[[i]], sd=fitnorm$sd[[i]])
}
fitnorm$dataset[1]
[[1]]
[1] 33.43470 28.66693 29.41060 32.95761 32.66531 29.86056 31.61961 29.32424 28.07063 31.80155
[11] 32.88489 31.90562 31.81625 24.62625 31.19141 27.41913 31.43993 29.60108 29.73310 23.77482
[21] 28.50347 27.22960 24.65698 27.13001 35.85981
fitnorm$mean[1]
[[1]]
[1] 29.82336
fitnorm$sd[1]
[[1]]
[1] 2.981638
fitnorm$normopl[1]
[[1]]
[1] 30.57875
更新 - Symbolix
R
中有一个 'rule of thumb',我尝试坚持使用 lapply
而不是 for
,因为它是 通常 效率更高(它在 C
中有效)——关于这个已经有很多讨论了。
因此,我会将您的 for
循环替换为
lst <- lapply(1:5000, function(x){
normdat <- rnorm(25, mean = 30, sd = sqrt(9))
list(fitnorm = list(dataset = normdat,
mean = mean(normdat),
sd = sd(normdat),
normopl = qnorm(1-(400/1000), mean = mean(normdat), sd = sd(normdat))
))
})
还有一点基准测试:
Unit: milliseconds
expr min lq mean median uq max neval
fun_lapply() 220.2830 236.1661 252.7315 249.1904 267.1123 337.0799 100
fun_for_loop() 373.5972 399.8972 427.1629 421.7407 442.4626 593.7227 100
最终,此示例中的收益很小,但值得牢记。
更新 - Symbolix 2
如果您喜欢使用它们,您也可以创建一个 data.frame
:
这里我使用 data.table
包是为了它提供的速度
library(data.table)
lst <- lapply(1:5000, function(x){
normdat <- rnorm(25, mean = 30, sd = sqrt(9))
data.table(id = x,
dataset = normdat,
mean = mean(normdat),
sd = sd(normdat),
normopl = qnorm(1-(400/1000), mean=mean(normdat), sd=sd(normdat)))
})
##lst is now a list of data.tables, so we can 'rbind' them together
dt <- rbindlist(lst)
## now we have one data.table, and the 'id' column indicates
## which dataset each row belongs too
dt
# id dataset mean sd normopl
# 1: 1 24.09486 29.46829 3.261638 30.29462
# 2: 1 26.30732 29.46829 3.261638 30.29462
# 3: 1 31.42603 29.46829 3.261638 30.29462
# 4: 1 29.69081 29.46829 3.261638 30.29462
# 5: 1 30.01235 29.46829 3.261638 30.29462
# ---
# 124996: 5000 28.13584 30.39716 2.591752 31.05377
# 124997: 5000 27.44665 30.39716 2.591752 31.05377
# 124998: 5000 29.79728 30.39716 2.591752 31.05377
# 124999: 5000 28.73398 30.39716 2.591752 31.05377
# 125000: 5000 27.83779 30.39716 2.591752 31.05377