如何比较R中的两个直方图？

Question

我想在 R 中的图形中比较两个直方图，但无法想象和实现。我的直方图基于两个子数据框，这些数据集按类型划分（动作、冒险家庭）我的第一个直方图是：

split_action <- split(df, df$type)
dataset_action <- split_action$Action
hist(dataset_action$year)

split_adventure <- split(df, df$type)
dataset_adventure <- split_adventure$Adventure
hist(dataset_adventure$year)

我想看看发生了多少重叠，它们在同一个直方图中基于年份进行比较。提前谢谢你。

Answer 1

使用 iris 数据集，假设您想为每个物种制作萼片长度的直方图。首先，你可以通过子集为每个物种制作3个数据框。

irissetosa<-subset(iris,Species=='setosa',select=c('Sepal.Length','Species'))
irisversi<-subset(iris,Species=='versicolor',select=c('Sepal.Length','Species'))
irisvirgin<-subset(iris,Species=='virginica',select=c('Sepal.Length','Species'))

然后，为这3个数据框制作直方图。不要忘记将参数 "add" 设置为 TRUE（对于第二个和第三个直方图），因为您想要组合直方图。

hist(irissetosa$Sepal.Length,col='red')
hist(irisversi$Sepal.Length,col='blue',add=TRUE)
hist(irisvirgin$Sepal.Length,col='green',add=TRUE)

你会有这样的东西

然后你可以看到哪部分是重叠的... 但是，我知道，这不是很好。另一种查看哪个部分重叠的方法是使用密度函数。

plot(density(irissetosa$Sepal.Length),col='red')
lines(density(irisversi$Sepal.Length),col='blue')
lines(density(irisvirgin$Sepal.Length,col='green'))

然后你会有这样的东西

希望对您有所帮助！！

Answer 2

如果使用 ggplot，则不需要拆分数据。关键是使用透明度 ("alpha") 并将 "position" 参数的值更改为 "identity"，因为默认值为 "stack".

使用鸢尾花数据集：

library(ggplot2)
ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
  geom_histogram(binwidth=0.2, alpha=0.5, position="identity") +
  theme_minimal()

不容易看到重叠，因此如果这是主要的 objective，密度图可能是更好的选择。同样，使用透明度来避免遮盖重叠图。

ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
  geom_density(alpha=0.5) +
  xlim(3.9,8.5) +
  theme_minimal()

所以对于你的数据，命令应该是这样的：

ggplot(data=df, aes(x=year, fill=type)) +
  geom_histogram(alpha=0.5, position="identity")

如何比较R中的两个直方图？

How to compare two histograms in R?

r

histogram