如何在箱线图上显示异常值的 ID
How to show the id of outliers on a boxplot
如何查看箱线图中异常值的 ID?
structure(list(pot = c(1L, 2L, 3L, 4L, 21L, 22L, 23L, 24L, 5L,
6L, 7L, 8L, 25L, 26L, 27L, 28L, 9L, 10L, 11L, 12L, 29L, 30L,
31L, 32L, 13L, 14L, 15L, 16L, 33L, 34L, 35L, 36L, 17L, 18L, 19L,
20L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 61L, 62L, 63L, 64L,
45L, 46L, 47L, 48L, 65L, 66L, 67L, 68L, 49L, 50L, 51L, 52L, 69L,
70L, 71L, 72L, 53L, 54L, 55L, 56L, 73L, 74L, 75L, 76L, 57L, 58L,
59L, 60L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 101L, 102L,
103L, 104L, 85L, 86L, 87L, 88L, 105L, 106L, 107L, 108L, 89L,
90L, 91L, 92L, 109L, 110L, 111L, 112L, 93L, 94L, 95L, 96L, 113L,
114L, 115L, 116L, 97L, 98L, 99L, 100L, 117L, 118L, 119L, 120L,
121L, 122L, 123L, 124L, 141L, 142L, 143L, 144L, 125L, 126L, 127L,
128L, 145L, 146L, 147L, 148L, 129L, 130L, 131L, 132L, 149L, 150L,
151L, 152L, 133L, 134L, 135L, 136L, 153L, 154L, 155L, 156L, 137L,
138L, 139L, 140L, 157L, 158L, 159L, 160L), rep = c(1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), cultivar = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Dinninup",
"Riverina", "Seaton Park", "Yarloop"), class = "factor"), Waterlogging = structure(c(2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Non-waterlogged",
"Waterlogged"), class = "factor"), P = c(12.1, 12.1, 12.1, 12.1,
12.1, 12.1, 12.1, 12.1, 15.17, 15.17, 15.17, 15.17, 15.17, 15.17,
15.17, 15.17, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24,
18.24, 24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 24.39,
48.35, 48.35, 48.35, 48.35, 48.35, 48.35, 48.35, 48.35, 12.1,
12.1, 12.1, 12.1, 12.1, 12.1, 12.1, 12.1, 15.17, 15.17, 15.17,
15.17, 15.17, 15.17, 15.17, 15.17, 18.24, 18.24, 18.24, 18.24,
18.24, 18.24, 18.24, 18.24, 24.39, 24.39, 24.39, 24.39, 24.39,
24.39, 24.39, 24.39, 48.35, 48.35, 48.35, 48.35, 48.35, 48.35,
48.35, 48.35, 12.1, 12.1, 12.1, 12.1, 12.1, 12.1, 12.1, 12.1,
15.17, 15.17, 15.17, 15.17, 15.17, 15.17, 15.17, 15.17, 18.24,
18.24, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24, 24.39, 24.39,
24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 48.35, 48.35, 48.35,
48.35, 48.35, 48.35, 48.35, 48.35, 12.1, 12.1, 12.1, 12.1, 12.1,
12.1, 12.1, 12.1, 15.17, 15.17, 15.17, 15.17, 15.17, 15.17, 15.17,
15.17, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24,
24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 48.35,
48.35, 48.35, 48.35, 48.35, 48.35, 48.35, 48.35), total = c(3.66,
2.02, 1.59, 1.67, 2.12, 2.46, 1.79, 2.09, 2.03, 2.13, 1.83, 2.34,
2.66, 2.2, 1.79, 1.97, 2.17, 2.44, 1.49, 2.19, 2.92, 2.43, 1.58,
2.07, 2.48, 2.49, 1.69, 2.1, 2.38, 2.52, 2.41, 2.46, 2.22, 2.07,
1.97, 2.3, 2.48, 3.16, 1.76, 2.38, 2.81, 2.64, 2.59, 3.28, 3.18,
2.57, 2.9, 3, 2.38, 2.72, 2.58, 2.73, 3.06, 3.01, 3.01, 2.77,
2.95, 2.36, 2.91, 2.38, 3.33, 3.19, 3.17, 3.16, 3.16, 3.2, 2.58,
3.71, 3.11, 2.7, 2.92, 1.93, 2.95, 2.57, 2.68, 2.48, 3.34, 2.75,
2.52, 1.88, 1.19, 0.57, 0.64, 0.66, 1.13, 1.28, 0.85, 0.96, 1.34,
2.14, 0.63, 1.27, 1.13, 0.64, 1.21, 1.95, 1.11, 0.91, 0.75, 0.63,
1.06, 1.07, 1.05, 0.8, 1.41, 1.13, 0.75, 0.89, 1.98, 1.27, 1.01,
1, 1.16, 0.64, 0.64, 1.02, 1.03, 1.13, 0.79, 0.6, 3.88, 2.79,
2.73, 2.77, 3.54, 2.05, 1.51, 1.88, 3.86, 3.13, 1.97, 3.46, 3.98,
3.6, 2.12, 2.86, 2.95, 1.65, 1.94, 2.53, 2.21, 1.94, 2.05, 2.22,
3, 3.28, 1.55, 3.85, 2.4, 2.1, 1.98, 1.81, 2.48, 1.66, 2.06,
1.23, 3.75, 1.99, 1.67, 1.93)), class = "data.frame", row.names = c(NA,
-160L))
boxplot(total~cultivar*as.factor(P),data=x)
这就是我想要的....
我尝试了以下示例,但没有用....
boxplot(total~cultivar*as.factor(P),data=x,id=list(n=Inf))
识别图中的离群值可以更轻松地将它们从分析中移除。出于某种原因,它并不像我想象的那么简单。 post 要求我添加更多详细信息,但我认为已经足够了。
不幸的是,虽然 boxplot
确实 return 一个 list
结构,它提供异常值的 值(例如,boxplot(..., plot=FALSE)$out
), 这在这里没有帮助,因为其他组中的值相等,但不是异常值。 (事实上 ,我发现使用 $out
总是有点冒险,除非它只是一组。)
但是你可以使用$stats
获取胡须参数并自行查找所有内容。不幸的是,这不是一条线。
不过,首先,由于我不知道您所说的 "id" 是什么意思,我将向数据中添加一些内容:
x$id <- seq_len(nrow(x))
base
R
bp <- boxplot(total ~ cultivar * as.factor(P), data = x)
lims <- data.frame(nm = bp$names, t(bp$stats[c(1,5),]))
tmpx <- merge(transform(x, nm = paste(cultivar, as.factor(P), sep = ".")), lims, by = "nm", all.x = TRUE)
tmpx <- subset(tmpx, total < X1 | total > X2)
tmpx$xval <- match(tmpx$nm, bp$names)
text(total ~ xval, id, data = tmpx, adj = c(-0.5, 0.5))
在箱线图上叠加文本对您来说可能是个问题;你可以玩各种移动 and/or 翻转坐标来控制它。裁剪(此处未显示,但当文本标签从绘图区域消失时)也可能是一个问题,因此您可能需要手动控制绘图区域的限制。
dplyr
如果您喜欢 tidyverse
查看数据处理的方式,这里有一个生成相同图的替代方法。
library(dplyr)
bp <- boxplot(total ~ cultivar * as.factor(P), data = x)
x %>%
mutate( nm = paste(cultivar, as.factor(P), sep = ".") ) %>%
left_join(data.frame(nm = bp$names, t(bp$stats[c(1,5),]), stringsAsFactors = FALSE),
by = "nm") %>%
filter(total < X1 | total > X2) %>%
mutate(xval = match(nm, bp$names)) %>%
text(data = ., total ~ xval, as.character(id), adj = c(-0.5, 0.5))
(相同情节。)
dplyr
和 ggplot2
library(dplyr)
library(ggplot2)
bp <- boxplot(total ~ cultivar * as.factor(P), data = x, plot = FALSE)
x %>%
mutate( nm = paste(cultivar, as.factor(P), sep = ".") ) %>%
left_join(data.frame(nm = bp$names, t(bp$stats[c(1,5),]), stringsAsFactors = FALSE),
by = "nm") %>%
mutate(outlier = total < X1 | total > X2) %>%
ggplot(aes(interaction(cultivar, P), total)) +
geom_boxplot() +
geom_text(aes(label = id), hjust = -0.5, data = ~ filter(., outlier)) +
coord_flip()
我选择翻转坐标,以便标签全部包含并显示,但这不是该方法所必需的。我使用的一个技巧是 data=
函数的 ggplot2
参数可以采用一个表达式(我认为它是波浪线函数),它允许就地对主数据集进行子集化。这里我使用 dplyr::filter
,但在这种情况下,如果您不使用 dplyr
.[=32=,那么使用 subset
(base
R)同样容易]
您可以使用car
包:
library(car)
Boxplot(total ~ cultivar*as.factor(P), id.method="y", data = x)
更新:
是否可以翻转car::Boxplot
中的坐标?
为了挑战,我尝试了一些hacky方法。毕竟,我能够 旋转 情节,但它不像 ggplot2::coord_flip
那样常规。在这里,我只是在旋转情节。因此,标签仍处于之前的对齐状态。我们可以更进一步,删除标签并重写它们,但这会破坏该解决方案的整个目的,即简单性。
library(car)
library(gridGraphics)
p <- Boxplot(total ~ cultivar*as.factor(P), id.method="y", data = x)
grab_grob <- function(){
grid.echo()
grid.grab()
}
g <- grab_grob()
grid.newpage()
pushViewport(viewport(width=0.5,angle=90))
grid.draw(g)
如何查看箱线图中异常值的 ID?
structure(list(pot = c(1L, 2L, 3L, 4L, 21L, 22L, 23L, 24L, 5L,
6L, 7L, 8L, 25L, 26L, 27L, 28L, 9L, 10L, 11L, 12L, 29L, 30L,
31L, 32L, 13L, 14L, 15L, 16L, 33L, 34L, 35L, 36L, 17L, 18L, 19L,
20L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 61L, 62L, 63L, 64L,
45L, 46L, 47L, 48L, 65L, 66L, 67L, 68L, 49L, 50L, 51L, 52L, 69L,
70L, 71L, 72L, 53L, 54L, 55L, 56L, 73L, 74L, 75L, 76L, 57L, 58L,
59L, 60L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 101L, 102L,
103L, 104L, 85L, 86L, 87L, 88L, 105L, 106L, 107L, 108L, 89L,
90L, 91L, 92L, 109L, 110L, 111L, 112L, 93L, 94L, 95L, 96L, 113L,
114L, 115L, 116L, 97L, 98L, 99L, 100L, 117L, 118L, 119L, 120L,
121L, 122L, 123L, 124L, 141L, 142L, 143L, 144L, 125L, 126L, 127L,
128L, 145L, 146L, 147L, 148L, 129L, 130L, 131L, 132L, 149L, 150L,
151L, 152L, 133L, 134L, 135L, 136L, 153L, 154L, 155L, 156L, 137L,
138L, 139L, 140L, 157L, 158L, 159L, 160L), rep = c(1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L,
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), cultivar = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Dinninup",
"Riverina", "Seaton Park", "Yarloop"), class = "factor"), Waterlogging = structure(c(2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Non-waterlogged",
"Waterlogged"), class = "factor"), P = c(12.1, 12.1, 12.1, 12.1,
12.1, 12.1, 12.1, 12.1, 15.17, 15.17, 15.17, 15.17, 15.17, 15.17,
15.17, 15.17, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24,
18.24, 24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 24.39,
48.35, 48.35, 48.35, 48.35, 48.35, 48.35, 48.35, 48.35, 12.1,
12.1, 12.1, 12.1, 12.1, 12.1, 12.1, 12.1, 15.17, 15.17, 15.17,
15.17, 15.17, 15.17, 15.17, 15.17, 18.24, 18.24, 18.24, 18.24,
18.24, 18.24, 18.24, 18.24, 24.39, 24.39, 24.39, 24.39, 24.39,
24.39, 24.39, 24.39, 48.35, 48.35, 48.35, 48.35, 48.35, 48.35,
48.35, 48.35, 12.1, 12.1, 12.1, 12.1, 12.1, 12.1, 12.1, 12.1,
15.17, 15.17, 15.17, 15.17, 15.17, 15.17, 15.17, 15.17, 18.24,
18.24, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24, 24.39, 24.39,
24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 48.35, 48.35, 48.35,
48.35, 48.35, 48.35, 48.35, 48.35, 12.1, 12.1, 12.1, 12.1, 12.1,
12.1, 12.1, 12.1, 15.17, 15.17, 15.17, 15.17, 15.17, 15.17, 15.17,
15.17, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24, 18.24,
24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 24.39, 48.35,
48.35, 48.35, 48.35, 48.35, 48.35, 48.35, 48.35), total = c(3.66,
2.02, 1.59, 1.67, 2.12, 2.46, 1.79, 2.09, 2.03, 2.13, 1.83, 2.34,
2.66, 2.2, 1.79, 1.97, 2.17, 2.44, 1.49, 2.19, 2.92, 2.43, 1.58,
2.07, 2.48, 2.49, 1.69, 2.1, 2.38, 2.52, 2.41, 2.46, 2.22, 2.07,
1.97, 2.3, 2.48, 3.16, 1.76, 2.38, 2.81, 2.64, 2.59, 3.28, 3.18,
2.57, 2.9, 3, 2.38, 2.72, 2.58, 2.73, 3.06, 3.01, 3.01, 2.77,
2.95, 2.36, 2.91, 2.38, 3.33, 3.19, 3.17, 3.16, 3.16, 3.2, 2.58,
3.71, 3.11, 2.7, 2.92, 1.93, 2.95, 2.57, 2.68, 2.48, 3.34, 2.75,
2.52, 1.88, 1.19, 0.57, 0.64, 0.66, 1.13, 1.28, 0.85, 0.96, 1.34,
2.14, 0.63, 1.27, 1.13, 0.64, 1.21, 1.95, 1.11, 0.91, 0.75, 0.63,
1.06, 1.07, 1.05, 0.8, 1.41, 1.13, 0.75, 0.89, 1.98, 1.27, 1.01,
1, 1.16, 0.64, 0.64, 1.02, 1.03, 1.13, 0.79, 0.6, 3.88, 2.79,
2.73, 2.77, 3.54, 2.05, 1.51, 1.88, 3.86, 3.13, 1.97, 3.46, 3.98,
3.6, 2.12, 2.86, 2.95, 1.65, 1.94, 2.53, 2.21, 1.94, 2.05, 2.22,
3, 3.28, 1.55, 3.85, 2.4, 2.1, 1.98, 1.81, 2.48, 1.66, 2.06,
1.23, 3.75, 1.99, 1.67, 1.93)), class = "data.frame", row.names = c(NA,
-160L))
boxplot(total~cultivar*as.factor(P),data=x)
这就是我想要的....
我尝试了以下示例,但没有用....
boxplot(total~cultivar*as.factor(P),data=x,id=list(n=Inf))
识别图中的离群值可以更轻松地将它们从分析中移除。出于某种原因,它并不像我想象的那么简单。 post 要求我添加更多详细信息,但我认为已经足够了。
不幸的是,虽然 boxplot
确实 return 一个 list
结构,它提供异常值的 值(例如,boxplot(..., plot=FALSE)$out
), 这在这里没有帮助,因为其他组中的值相等,但不是异常值。 (事实上 ,我发现使用 $out
总是有点冒险,除非它只是一组。)
但是你可以使用$stats
获取胡须参数并自行查找所有内容。不幸的是,这不是一条线。
不过,首先,由于我不知道您所说的 "id" 是什么意思,我将向数据中添加一些内容:
x$id <- seq_len(nrow(x))
base
R
bp <- boxplot(total ~ cultivar * as.factor(P), data = x)
lims <- data.frame(nm = bp$names, t(bp$stats[c(1,5),]))
tmpx <- merge(transform(x, nm = paste(cultivar, as.factor(P), sep = ".")), lims, by = "nm", all.x = TRUE)
tmpx <- subset(tmpx, total < X1 | total > X2)
tmpx$xval <- match(tmpx$nm, bp$names)
text(total ~ xval, id, data = tmpx, adj = c(-0.5, 0.5))
在箱线图上叠加文本对您来说可能是个问题;你可以玩各种移动 and/or 翻转坐标来控制它。裁剪(此处未显示,但当文本标签从绘图区域消失时)也可能是一个问题,因此您可能需要手动控制绘图区域的限制。
dplyr
如果您喜欢 tidyverse
查看数据处理的方式,这里有一个生成相同图的替代方法。
library(dplyr)
bp <- boxplot(total ~ cultivar * as.factor(P), data = x)
x %>%
mutate( nm = paste(cultivar, as.factor(P), sep = ".") ) %>%
left_join(data.frame(nm = bp$names, t(bp$stats[c(1,5),]), stringsAsFactors = FALSE),
by = "nm") %>%
filter(total < X1 | total > X2) %>%
mutate(xval = match(nm, bp$names)) %>%
text(data = ., total ~ xval, as.character(id), adj = c(-0.5, 0.5))
(相同情节。)
dplyr
和 ggplot2
library(dplyr)
library(ggplot2)
bp <- boxplot(total ~ cultivar * as.factor(P), data = x, plot = FALSE)
x %>%
mutate( nm = paste(cultivar, as.factor(P), sep = ".") ) %>%
left_join(data.frame(nm = bp$names, t(bp$stats[c(1,5),]), stringsAsFactors = FALSE),
by = "nm") %>%
mutate(outlier = total < X1 | total > X2) %>%
ggplot(aes(interaction(cultivar, P), total)) +
geom_boxplot() +
geom_text(aes(label = id), hjust = -0.5, data = ~ filter(., outlier)) +
coord_flip()
我选择翻转坐标,以便标签全部包含并显示,但这不是该方法所必需的。我使用的一个技巧是 data=
函数的 ggplot2
参数可以采用一个表达式(我认为它是波浪线函数),它允许就地对主数据集进行子集化。这里我使用 dplyr::filter
,但在这种情况下,如果您不使用 dplyr
.[=32=,那么使用 subset
(base
R)同样容易]
您可以使用car
包:
library(car)
Boxplot(total ~ cultivar*as.factor(P), id.method="y", data = x)
更新:
是否可以翻转car::Boxplot
中的坐标?
为了挑战,我尝试了一些hacky方法。毕竟,我能够 旋转 情节,但它不像 ggplot2::coord_flip
那样常规。在这里,我只是在旋转情节。因此,标签仍处于之前的对齐状态。我们可以更进一步,删除标签并重写它们,但这会破坏该解决方案的整个目的,即简单性。
library(car)
library(gridGraphics)
p <- Boxplot(total ~ cultivar*as.factor(P), id.method="y", data = x)
grab_grob <- function(){
grid.echo()
grid.grab()
}
g <- grab_grob()
grid.newpage()
pushViewport(viewport(width=0.5,angle=90))
grid.draw(g)