将 "noise" 添加到图表

Question

我正在使用 R 编程语言。我使用“ggplot”库制作了下图：

#load library
library(RSSL)
library(ggplot2)

#generate first data
d <- generateCrescentMoon(1000,2,1)
d$c = ifelse(d$Class == "+", "1","0")
d$Class = NULL

ggplot(d, aes(x=X1, y=X2, color=c, shape=c)) +  geom_point()

现在，我正在尝试向该图的不同区域添加一些“噪音”。我做了以下事情：

#noise the first region (x1: -5 to 0 AND x2: -10 to 10)

c <- sample(0:1, 1000, TRUE)

X1 <- runif(100, min=-5, max=0)
X2 <- runif(100, min=-10, max=10)

a = data.frame(X1,X2,c)
a$c = as.factor(a$c)

g = rbind(a,d)

这为所需区域添加了噪音：

现在，我正在尝试向角落区域添加“噪音”

区域 1：（x1：-10 到 -5 AND x2：-5 到 -10）区域 2：（x1：5 到 10 AND x2：5 到 10）

我通过重写现有文件并将它们合并在一起来做到这一点：

#Add noise to Region2
c <- sample( 0:1, replace=TRUE, prob=c(0.5, 0.5) )
X1 <- runif(100, min=5, max=10)
X2 <- runif(100, min=5, max=10)
f = data.frame(c,X1,X2)
f$c = as.factor(f$c)

#Append
gg = rbind(g,f)

#Add noise to Region1
c <- sample( 0:1, replace=TRUE, prob=c(0.5, 0.5) )
X1 <- runif(100, min=-10, max=-5)
X2 <- runif(100, min=-10, max=-5)
f = data.frame(c,X1,X2)
f$c = as.factor(f$c)

#Append ("g" is the final file)
g= rbind(gg,f)

但是当我尝试绘制此图时，噪声并未出现在“区域 2”中

#plot
ggplot(g, aes(x=X1, y=X2, color=c, shape=c)) +  geom_point()

有人知道为什么会这样吗？这是因为随机数生成过程吗？还是我的代码有错误？

谢谢

Answer 1

错误在你的行中：

c <- sample( 0:1, replace=TRUE, prob=c(0.5, 0.5) )

它从 0 和 1 中提取两个个样本。一半的时间你会提取两个不同的值（1 和 0，或 0 和 1），一半的时间您将提取两个匹配值。当它匹配值时，您会得到意想不到的结果。

来自 sample 的帮助：

For sample the default for size is the number of items inferred from the first argument, so that sample(x) generates a random permutation of the elements of x (or 1:x).

当您在数据框中使用二元向量 c 作为变量时，它将重复二元向量 50 次以匹配 X1 和 X2 的长度。

你或许应该使用

c <- sample( 0:1, size = 100, replace=TRUE, prob=c(0.5, 0.5))

这将拉出一个长度为 100 的向量，并且基本上永远不会全为 0 或全为 1。

将 "noise" 添加到图表

Adding "noise" to a graph

random

r

data-visualization

ggplot2