如何绘制二分 presence/absence 数据以补充时间序列数据

How to plot dichotomous presence/absence data to complement timeseries data

我有几个变量的每日记录器数据。对于此示例,我使用 3 个变量:降水、距离和二分变量 (1/0),如果降水以雨的形式出现则为 1,如果没有降水或降水以雪的形式出现则为 0。以下是数据的示例:

date <- as.Date(c('2010-11-1','2010-11-2','2010-11-3','2010-11-4','2010-11-5','2010-11-6','2010-11-7','2010-11-8','2010-11-9','2010-11-10'))
distance <- c(5,4,4,7,9,7,NA,5,6,4)
precipitation <- c(11,15,NA,0,3,0,2,2,9,10)
dicht <- c(1,1,NA,0,1,0,1,0,0,1)

df <- data.frame(date, distance, precipitation, dicht)
date distance precipitation dicht
1  2010-11-01        5            11     1
2  2010-11-02        4            15     1
3  2010-11-03        4            NA    NA
4  2010-11-04        7             0     0
5  2010-11-05        9             3     1
6  2010-11-06        7             0     0
7  2010-11-07       NA             2     1
8  2010-11-08        5             2     0
9  2010-11-09        6             9     0
10 2010-11-10        4            10     1

本例中的距离(雪地滑行距离)是因变量。这取决于降水量(以及与问题无关的一系列其他因素)。我假设降雨形式的降水会导致滑翔增加。 “降水”变量本身不区分雨和雪。这就是为什么我有基于几个条件创建的二分变量和此处未包含的其他几个变量。我的目标是以某种方式形象化整个事情。这是我到目前为止的情节(对于此处显示的示例数据):

library(ggplot2)
library(scales)
library(patchwork)
library(dplyr)

#### Setting up language settings and start time and endtime. 

Sys.setlocale(category = "LC_ALL", locale = "english")

startTime <- as.Date("2010-11-01")
endTime <- as.Date("2010-11-10")
start_end <- c(startTime,endTime)

#### Plotting

scale <- max(df$distance, na.rm = T) / max(df$precipitation, na.rm = T)

ggplot()+
  geom_line(data = df, aes(x = date, y = distance, color = "Average daily gliding distance"),na.rm= TRUE, size = 1)+
  geom_line(data = df, aes(x = date, y = precipitation*scale ,color = "Daily precipitation amount"),na.rm = TRUE, size = 1) +
  ggtitle("Daily precipitation & Average Gliding Distance") +
  labs(color = "")+
  xlab("2010")+
  ylab("Accumulated Distance [mm]")+
  scale_x_date(limits=start_end,breaks=date_breaks("1 day"),labels=date_format("%d %b"))+
  scale_y_continuous(sec.axis = sec_axis(~./scale,name = "Daily precipitation amount"),limits = c(0, 15))+
  scale_color_manual("", guide = "legend",
                     values = c("Average daily gliding distance"= "darkorange2",
                                "Daily precipitation amount" = "black"))+
  theme(legend.position="bottom",
        #legend.title = element_blank(),
        axis.text.x = element_text(angle = 50, size = 10 , vjust = 0.5),
        axis.text.y = element_text(size = 10, vjust = 0.5), 
        panel.background = element_rect(fill = "gray100"),
        plot.background = element_rect(fill = "gray100"),
        panel.grid.major = element_line(colour = "lightblue"),
        plot.margin = unit(c(1, 1, 1, 1), "cm"),
        plot.title = element_text(hjust = 0.5, size = 22))

在 x 轴上我有日期。如您所见,我为两个变量设置了一个双 Y 轴,我对其进行了缩放以实现更好的可视化效果。我在数据中包含了 NA,以便更好地表示我拥有的真实数据。现在我遇到的问题是我想以某种方式适应二分数据。我需要以某种方式在图表上指示降水形式是雨 [df$dicht == 1] 还是雪 [df$dicht == 0] 。有没有一种方法可以根据二分值以某种方式标记每个数据点(时间点)的降水线?我知道它可能会变得太拥挤,但在最坏的情况下,我至少需要在 [df$dicht == 1] 时指出。或者,如果您认为以雨的形式表示降水时的其他方式比在降水线上放置符号更合适,我很乐意检查一下。一般来说,如果您有完全不同的 statistical/visualization 方法,请随时提出建议。如果您需要更多 explanation/data/examples 等,请告诉我。提前致谢。

佐林

在这种情况下你至少有两个选择:

  1. 根据dicht变量的值设置不同的线型
  2. 使用facet_wrap()dicht
  3. 的每个值绘制一张图

我删除了 NA 以改进可视化。

使用不同线型的方法如下:

ggplot()+
  geom_line(data = df[!is.na(df$dicht),], aes(x = date, y = distance, color = "Average daily gliding distance", linetype=as.factor(dicht)),na.rm= TRUE, size = 1)+
  geom_line(data = df[!is.na(df$dicht),], aes(x = date, y = precipitation*scale ,color = "Daily precipitation amount", linetype=as.factor(dicht)),na.rm = TRUE, size = 1) +
  ggtitle("Daily precipitation & Average Gliding Distance") +
  labs(color = "")+
  xlab("2010")+
  ylab("Accumulated Distance [mm]")+
  scale_x_date(limits=start_end,breaks=date_breaks("1 day"),labels=date_format("%d %b"))+
  scale_y_continuous(sec.axis = sec_axis(~./scale,name = "Daily precipitation amount"),limits = c(0, 15))+
  scale_color_manual("", guide = "legend",
                     values = c("Average daily gliding distance"= "darkorange2",
                                "Daily precipitation amount" = "black"))+
  theme(legend.position="bottom",
        #legend.title = element_blank(),
        axis.text.x = element_text(angle = 50, size = 10 , vjust = 0.5),
        axis.text.y = element_text(size = 10, vjust = 0.5), 
        panel.background = element_rect(fill = "gray100"),
        plot.background = element_rect(fill = "gray100"),
        panel.grid.major = element_line(colour = "lightblue"),
        plot.margin = unit(c(1, 1, 1, 1), "cm"),
        plot.title = element_text(hjust = 0.5, size = 22)) +
  scale_linetype_manual("Preciptation type",values=c(1, 2), labels=c("No precipitation", "Rain"))

facet_wrap() 的方法如下:

ggplot()+
  geom_line(data = df[!is.na(df$dicht),], aes(x = date, y = distance, color = "Average daily gliding distance"),na.rm= TRUE, size = 1)+
  geom_line(data = df[!is.na(df$dicht),], aes(x = date, y = precipitation*scale ,color = "Daily precipitation amount"),na.rm = TRUE, size = 1) +
  ggtitle("Daily precipitation & Average Gliding Distance") +
  labs(color = "")+
  xlab("2010")+
  ylab("Accumulated Distance [mm]")+
  scale_x_date(limits=start_end,breaks=date_breaks("1 day"),labels=date_format("%d %b"))+
  scale_y_continuous(sec.axis = sec_axis(~./scale,name = "Daily precipitation amount"),limits = c(0, 15))+
  scale_color_manual("", guide = "legend",
                     values = c("Average daily gliding distance"= "darkorange2",
                                "Daily precipitation amount" = "black"))+
  theme(legend.position="bottom",
        #legend.title = element_blank(),
        axis.text.x = element_text(angle = 50, size = 10 , vjust = 0.5),
        axis.text.y = element_text(size = 10, vjust = 0.5), 
        panel.background = element_rect(fill = "gray100"),
        plot.background = element_rect(fill = "gray100"),
        panel.grid.major = element_line(colour = "lightblue"),
        plot.margin = unit(c(1, 1, 1, 1), "cm"),
        plot.title = element_text(hjust = 0.5, size = 22)) +
  facet_wrap(~dicht, labeller = as_labeller(c("0"="No precipitation", "1"="Rain")))

第一种方法更合适,因为每天只有一个值,要么下雨,要么不下雨。但是,这两种方法都在视觉上“插入”缺失值,将线从最后一个非空值拉伸到下一个,即使没有可用数据也是如此。 因此,我建议您改用 geom_point()

ggplot()+
geom_point(data = df[!is.na(df$dicht),], aes(x = date, y = distance, color = "Average daily gliding distance", shape=as.factor(dicht)),na.rm= TRUE, size = 5)+
geom_point(data = df[!is.na(df$dicht),], aes(x = date, y = precipitation*scale ,color = "Daily precipitation amount", shape=as.factor(dicht)),na.rm = TRUE, size = 5) +
ggtitle("Daily precipitation & Average Gliding Distance") +
labs(color = "")+
xlab("2010")+
ylab("Accumulated Distance [mm]")+
scale_x_date(limits=start_end,breaks=date_breaks("1 day"),labels=date_format("%d %b"))+
scale_y_continuous(sec.axis = sec_axis(~./scale,name = "Daily precipitation amount"),limits = c(0, 15))+
scale_color_manual("", guide = "legend",
                   values = c("Average daily gliding distance"= "darkorange2",
                              "Daily precipitation amount" = "black"))+
theme(legend.position="bottom",
      #legend.title = element_blank(),
      axis.text.x = element_text(angle = 50, size = 10 , vjust = 0.5),
      axis.text.y = element_text(size = 10, vjust = 0.5), 
      panel.background = element_rect(fill = "gray100"),
      plot.background = element_rect(fill = "gray100"),
      panel.grid.major = element_line(colour = "lightblue"),
      plot.margin = unit(c(1, 1, 1, 1), "cm"),
      plot.title = element_text(hjust = 0.5, size = 22)) +
scale_shape_manual("Preciptation type",values=c(1, 18), labels=c("No precipitation", "Rain"))

基于@Paulo Schau Guerra 的第三个解决方案,我在两点之间添加了线条:

ggplot()+
  geom_line(data = df, aes(x = date, y = distance, color = "Average daily gliding distance"),na.rm= TRUE, size = 1)+
  geom_line(data = df, aes(x = date, y = precipitation*scale ,color = "Daily precipitation amount"),na.rm = TRUE, size = 1) +
  geom_point(data = df[!is.na(df$dicht),], aes(x = date, y = distance, color = "Average daily gliding distance", shape=as.factor(dicht)),na.rm= TRUE, size = 5)+
  geom_point(data = df[!is.na(df$dicht),], aes(x = date, y = precipitation*scale ,color = "Daily precipitation amount", shape=as.factor(dicht)),na.rm = TRUE, size = 5) +
  ggtitle("Daily precipitation & Average Gliding Distance") +
  labs(color = "")+
  xlab("2010")+
  ylab("Accumulated Distance [mm]")+
  scale_x_date(limits=start_end,breaks=date_breaks("1 day"),labels=date_format("%d %b"))+
  scale_y_continuous(sec.axis = sec_axis(~./scale,name = "Daily precipitation amount"),limits = c(0, 15))+
  scale_color_manual("", guide = "legend",
                     values = c("Average daily gliding distance"= "darkorange2",
                                "Daily precipitation amount" = "black"))+
  theme(legend.position="bottom",
        #legend.title = element_blank(),
        axis.text.x = element_text(angle = 50, size = 10 , vjust = 0.5),
        axis.text.y = element_text(size = 10, vjust = 0.5), 
        panel.background = element_rect(fill = "gray100"),
        plot.background = element_rect(fill = "gray100"),
        panel.grid.major = element_line(colour = "lightblue"),
        plot.margin = unit(c(1, 1, 1, 1), "cm"),
        plot.title = element_text(hjust = 0.5, size = 22))+
  scale_shape_manual("Preciptation type",values=c(1, 18), labels=c("No precipitation", "Rain"))