如何绘制数据框的多列以查看每列中数据的位置?
How to plot multiple columns of a data frame to see where data exists in each column?
我有以下数据框:
Index | ColA | ColB | ColC | ColD
1 | NA | NA | 0 | NA
2 | NA | 0 | 1 | 0
3 | NA | NA | 2 | 1
4 | 1 | 0 | 2 | 2
5 | NA | NA | 2 | NA
6 | NA | 1 | 1 | 1
7 | 0 | 1 | 0 | 2
8 | NA | 2 | 0 | 2
9 | NA | 0 | NA | 1
10 | 2 | 1 | 0 | 0
现在,我想使用 R 绘制此数据,其中 X 轴是索引列,Y 轴代表剩余的列名(ColA、ColB、ColC 和 ColD)。图中的每个 x-y 点应表示是否存在 NA 或 Non-NA。与此类似(对于上述数据框):
ColD - - - - - - - -
ColC - - - - - - - - -
ColB - - - - - - -
ColA - - -
1 2 3 4 5 6 7 8 9 10
在此先感谢您的帮助!
这是一种使用 plot
的方法。
# get values of x axis from data as a vector
xVals <- as.integer(!is.na(df)) * 1:10
# get values of y axis
yVals <- rep(1:4, each=10)
# add appropriate NAs
is.na(xVals) <- xVals == 0
is.na(y) <- is.na(xVals)
# plot the results
plot(xVals, yVals)
数据
set.seed(1234)
df <- data.frame(ColA=sample(c(0:2,NA), size=10, replace=T, prob=c(.2,.2,.2,.4)),
ColB=sample(c(0:2,NA), size=10, replace=T, prob=c(.2,.2,.2,.4)),
ColC=sample(c(0:2,NA), size=10, replace=T, prob=c(.2,.2,.2,.4)),
ColD=sample(c(0:2,NA), size=10, replace=T, prob=c(.2,.2,.2,.4)))
这是使用 ggplot
的情节:
数据
df <- structure(list(Index = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
ColA = c(NA, NA, NA, 1, NA, NA, 0, NA, NA, 2),
ColB = c(NA, 0, NA, 0, NA, 1, 1, 2, 0, 1),
ColC = c(0, 1, 2, 2, 2, 1, 0, 0, NA, 0),
ColD = c(NA, 0, 1, 2, NA, 1, 2, 2, 1, 0)),
.Names = c("Index", "ColA", "ColB", "ColC", "ColD"),
row.names = c(NA, -10L), class = "data.frame") 0, 1, 2, NA, 1, 2, 2, 1, 0)), .Names = c("Index", "ColA", "ColB", "ColC", "ColD"), row.names = c(NA, -10L), class = "data.frame")
情节
library(ggplot2)
library(reshape2)
ggplot(melt(df, "Index"), aes(x=as.factor(Index), y=variable, alpha=!is.na(value))) +
geom_point() +
labs(x="Index", y="Variable") +
scale_alpha_discrete("", breaks=c(TRUE, FALSE), labels=c("Not NA", "NA"))
我有以下数据框:
Index | ColA | ColB | ColC | ColD
1 | NA | NA | 0 | NA
2 | NA | 0 | 1 | 0
3 | NA | NA | 2 | 1
4 | 1 | 0 | 2 | 2
5 | NA | NA | 2 | NA
6 | NA | 1 | 1 | 1
7 | 0 | 1 | 0 | 2
8 | NA | 2 | 0 | 2
9 | NA | 0 | NA | 1
10 | 2 | 1 | 0 | 0
现在,我想使用 R 绘制此数据,其中 X 轴是索引列,Y 轴代表剩余的列名(ColA、ColB、ColC 和 ColD)。图中的每个 x-y 点应表示是否存在 NA 或 Non-NA。与此类似(对于上述数据框):
ColD - - - - - - - -
ColC - - - - - - - - -
ColB - - - - - - -
ColA - - -
1 2 3 4 5 6 7 8 9 10
在此先感谢您的帮助!
这是一种使用 plot
的方法。
# get values of x axis from data as a vector
xVals <- as.integer(!is.na(df)) * 1:10
# get values of y axis
yVals <- rep(1:4, each=10)
# add appropriate NAs
is.na(xVals) <- xVals == 0
is.na(y) <- is.na(xVals)
# plot the results
plot(xVals, yVals)
数据
set.seed(1234)
df <- data.frame(ColA=sample(c(0:2,NA), size=10, replace=T, prob=c(.2,.2,.2,.4)),
ColB=sample(c(0:2,NA), size=10, replace=T, prob=c(.2,.2,.2,.4)),
ColC=sample(c(0:2,NA), size=10, replace=T, prob=c(.2,.2,.2,.4)),
ColD=sample(c(0:2,NA), size=10, replace=T, prob=c(.2,.2,.2,.4)))
这是使用 ggplot
的情节:
数据
df <- structure(list(Index = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
ColA = c(NA, NA, NA, 1, NA, NA, 0, NA, NA, 2),
ColB = c(NA, 0, NA, 0, NA, 1, 1, 2, 0, 1),
ColC = c(0, 1, 2, 2, 2, 1, 0, 0, NA, 0),
ColD = c(NA, 0, 1, 2, NA, 1, 2, 2, 1, 0)),
.Names = c("Index", "ColA", "ColB", "ColC", "ColD"),
row.names = c(NA, -10L), class = "data.frame") 0, 1, 2, NA, 1, 2, 2, 1, 0)), .Names = c("Index", "ColA", "ColB", "ColC", "ColD"), row.names = c(NA, -10L), class = "data.frame")
情节
library(ggplot2)
library(reshape2)
ggplot(melt(df, "Index"), aes(x=as.factor(Index), y=variable, alpha=!is.na(value))) +
geom_point() +
labs(x="Index", y="Variable") +
scale_alpha_discrete("", breaks=c(TRUE, FALSE), labels=c("Not NA", "NA"))