如何向现有数据集添加新值,以便在 R 中只有范围发生变化但均值保持不变?
How to add new value to existing dataset so that only the range changes but mean remains the same in R?
您好,我是一名学习统计学的学生,因为我的教科书不包含很多 R 编码,但包含更多基本计算。因此,想问一下 R 中是否有办法向具有特定均值和范围的现有生成集添加额外的数字?
1(a)用R模拟一组100个数,均值为20,标准差为2。列出这组数。
> x <- rnorm(100,20,2)
> print(x)
[1] 20.59256 20.66069 12.68841 21.13575 24.09587 21.69535 20.18661 21.71236 20.92864 19.63182 22.12583 19.06238
[13] 18.73813 22.59813 17.30012 16.98957 20.74050 21.28319 19.75426 20.62065 20.20814 18.16406 22.24261 22.05673
[25] 21.27086 18.78538 21.86479 18.03242 21.00538 20.27731 22.59440 23.24389 20.20846 19.73281 19.50040 20.51712
[37] 20.16493 23.56715 21.25884 18.37542 19.84470 19.81911 16.94701 19.06637 17.74580 18.03151 19.57144 16.45314
[49] 20.89975 21.86249 17.42996 23.52514 21.17759 20.20160 18.11839 21.69716 16.93685 20.62335 20.37935 22.46131
[61] 17.78489 19.90424 17.67674 20.20571 21.60567 20.41897 20.25134 22.44366 19.06513 20.62692 24.04101 24.03634
[73] 20.15566 20.33157 20.22881 20.54014 19.49401 17.34388 19.94099 18.71450 19.24386 19.91813 18.71863 20.94027
[85] 17.55676 17.18079 24.96868 24.09565 19.87488 20.06114 19.21374 18.39874 21.01435 18.38329 20.91788 21.45158
[97] 20.43168 21.80438 20.50405 23.07149
(b) 将另外 2 个数字添加到问题 1(a) 中模拟的集合中,这样新集合现在的(相同)平均值为 20,但范围变为 200。列出数字集合。
因为你需要200的范围,那么每个聚合应该是current_range-+desired_range/2
代码中的解决方案:
> x <- rnorm(100,20,2)
>
> x
[1] 17.84671 19.02797 23.83426 21.28975 20.35738 19.35365 22.57753 15.09991 18.18989 21.61537 20.97786 20.74412 20.95964
[14] 20.00677 13.79552 16.65435 23.48840 19.50842 25.10979 21.10134 19.15891 22.58312 23.65634 17.89358 17.98529 22.33547
[27] 20.84291 21.28044 22.37447 16.89740 19.95510 17.67625 19.64634 18.07762 21.50655 18.62182 18.59671 15.53542 12.85074
[40] 19.06638 19.90743 18.64610 20.71322 22.78706 22.33449 22.30899 17.09384 21.57055 19.88208 18.85795 18.52198 23.70028
[53] 22.91794 20.24993 20.63627 19.01672 19.34706 17.42375 21.88536 20.91214 21.16099 23.54738 21.40821 21.06485 23.95725
[66] 21.09893 16.15641 21.28983 19.27113 17.89774 23.24801 23.23136 22.67976 23.21619 20.17257 21.09512 16.83565 22.17975
[79] 20.50282 23.86079 14.97483 16.91109 18.66540 21.79649 21.01789 18.81188 19.77038 25.04698 17.69211 20.04085 17.29910
[92] 18.98335 16.37297 19.78979 18.83341 16.60093 19.41327 17.85721 22.55003 16.67850
>
> mean(x)
[1] 19.99774
>
> sd(x)
[1] 2.494173
>
> range <- range(x)[2]-range(x)[1]
>
> range
[1] 12.25905
>
> x <- c(x,range+100,range-100)
>
> mean(x)
[1] 19.846
>
> sd(x)
[1] 14.3276
>
> range <- range(x)[2]-range(x)[1]
>
> range
[1] 200
>
首先创建可重现的数据:
set.seed(42)
x <- rnorm(100,20,2)
mean(x)
# [1] 20.06503
range(x)
# [1] 14.01382 24.57329
(x2 <- mean(x) + c(-100, 100))
# [1] -79.93497 120.06503
为了保持均值不变,我们需要在均值上方添加 100 个点,在均值下方添加 100 个点。幸运的是,这些点超出了原始范围。
mean(c(x, x2))
# [1] 20.06503
diff(range(c(x, x2)))
# [1] 200
均值相同,范围现在为 200。
您好,我是一名学习统计学的学生,因为我的教科书不包含很多 R 编码,但包含更多基本计算。因此,想问一下 R 中是否有办法向具有特定均值和范围的现有生成集添加额外的数字?
1(a)用R模拟一组100个数,均值为20,标准差为2。列出这组数。
> x <- rnorm(100,20,2)
> print(x)
[1] 20.59256 20.66069 12.68841 21.13575 24.09587 21.69535 20.18661 21.71236 20.92864 19.63182 22.12583 19.06238
[13] 18.73813 22.59813 17.30012 16.98957 20.74050 21.28319 19.75426 20.62065 20.20814 18.16406 22.24261 22.05673
[25] 21.27086 18.78538 21.86479 18.03242 21.00538 20.27731 22.59440 23.24389 20.20846 19.73281 19.50040 20.51712
[37] 20.16493 23.56715 21.25884 18.37542 19.84470 19.81911 16.94701 19.06637 17.74580 18.03151 19.57144 16.45314
[49] 20.89975 21.86249 17.42996 23.52514 21.17759 20.20160 18.11839 21.69716 16.93685 20.62335 20.37935 22.46131
[61] 17.78489 19.90424 17.67674 20.20571 21.60567 20.41897 20.25134 22.44366 19.06513 20.62692 24.04101 24.03634
[73] 20.15566 20.33157 20.22881 20.54014 19.49401 17.34388 19.94099 18.71450 19.24386 19.91813 18.71863 20.94027
[85] 17.55676 17.18079 24.96868 24.09565 19.87488 20.06114 19.21374 18.39874 21.01435 18.38329 20.91788 21.45158
[97] 20.43168 21.80438 20.50405 23.07149
(b) 将另外 2 个数字添加到问题 1(a) 中模拟的集合中,这样新集合现在的(相同)平均值为 20,但范围变为 200。列出数字集合。
因为你需要200的范围,那么每个聚合应该是current_range-+desired_range/2
代码中的解决方案:
> x <- rnorm(100,20,2)
>
> x
[1] 17.84671 19.02797 23.83426 21.28975 20.35738 19.35365 22.57753 15.09991 18.18989 21.61537 20.97786 20.74412 20.95964
[14] 20.00677 13.79552 16.65435 23.48840 19.50842 25.10979 21.10134 19.15891 22.58312 23.65634 17.89358 17.98529 22.33547
[27] 20.84291 21.28044 22.37447 16.89740 19.95510 17.67625 19.64634 18.07762 21.50655 18.62182 18.59671 15.53542 12.85074
[40] 19.06638 19.90743 18.64610 20.71322 22.78706 22.33449 22.30899 17.09384 21.57055 19.88208 18.85795 18.52198 23.70028
[53] 22.91794 20.24993 20.63627 19.01672 19.34706 17.42375 21.88536 20.91214 21.16099 23.54738 21.40821 21.06485 23.95725
[66] 21.09893 16.15641 21.28983 19.27113 17.89774 23.24801 23.23136 22.67976 23.21619 20.17257 21.09512 16.83565 22.17975
[79] 20.50282 23.86079 14.97483 16.91109 18.66540 21.79649 21.01789 18.81188 19.77038 25.04698 17.69211 20.04085 17.29910
[92] 18.98335 16.37297 19.78979 18.83341 16.60093 19.41327 17.85721 22.55003 16.67850
>
> mean(x)
[1] 19.99774
>
> sd(x)
[1] 2.494173
>
> range <- range(x)[2]-range(x)[1]
>
> range
[1] 12.25905
>
> x <- c(x,range+100,range-100)
>
> mean(x)
[1] 19.846
>
> sd(x)
[1] 14.3276
>
> range <- range(x)[2]-range(x)[1]
>
> range
[1] 200
>
首先创建可重现的数据:
set.seed(42)
x <- rnorm(100,20,2)
mean(x)
# [1] 20.06503
range(x)
# [1] 14.01382 24.57329
(x2 <- mean(x) + c(-100, 100))
# [1] -79.93497 120.06503
为了保持均值不变,我们需要在均值上方添加 100 个点,在均值下方添加 100 个点。幸运的是,这些点超出了原始范围。
mean(c(x, x2))
# [1] 20.06503
diff(range(c(x, x2)))
# [1] 200
均值相同,范围现在为 200。