一起绘制不同年份的时间序列

Plot time series of different years together

我正在尝试比较不同年份的变量,但无法将它们放在一起绘制。 时间序列是一个温度序列,可以在 https://github.com/gonzalodqa/timeseries 中找到 temp.csv 我想绘制类似图像的东西,但我发现很难将年份之间的月份进行子集化,然后在相同的月份下将同一图中的线条组合起来

如果有人能给我一些建议或指出正确的方向,我将不胜感激

你可以这样试试

第一个图表显示所有可用温度,第二个图表按月汇总。

在第一个图表中,我们强制使用同一年,以便 ggplot 将它们对齐绘制,但我们按颜色分隔线条。

对于第二个,我们只使用 month 作为 x 变量和 year 作为 colour 变量。

注意:

  • 使用 scale_x_datetime 我们可以隐藏年份,这样就没有人可以看到我们将 2020 年强加到每个观察结果中
  • 通过 scale_x_continous 我们可以显示月份的名称而不是数字

[试着 运行 有和没有 scale_x_... 的图表来理解我在说什么]

month.abb 是月份名称的有用默认变量。

# read data
df <- readr::read_csv2("https://raw.githubusercontent.com/gonzalodqa/timeseries/main/temp.csv")


# libraries
library(ggplot2)
library(dplyr)


# line chart by datetime
df %>% 
  # make datetime: force unique year
  mutate(datetime = lubridate::make_datetime(2020, month, day, hour, minute, second)) %>% 
  
  ggplot() +
  geom_line(aes(x = datetime, y = T42, colour = factor(year))) +
  scale_x_datetime(breaks = lubridate::make_datetime(2020,1:12), labels = month.abb) +
  labs(title = "Temperature by Datetime", colour = "Year")

# line chart by month
df %>% 
  
  # average by year-month
  group_by(year, month) %>% 
  summarise(T42 = mean(T42, na.rm = TRUE), .groups = "drop") %>% 
  
  ggplot() +
  geom_line(aes(x = month, y = T42, colour = factor(year))) +
  scale_x_continuous(breaks = 1:12, labels = month.abb, minor_breaks = NULL) +
  labs(title = "Average Temperature by Month", colour = "Year")


如果您希望您的图表从 7 月开始,您可以改用此代码:

months_order <- c(7:12,1:6)

# line chart by month
df %>% 
  
  # average by year-month
  group_by(year, month) %>% 
  summarise(T42 = mean(T42, na.rm = TRUE), .groups = "drop") %>% 
    
  # create new groups starting from each July
  group_by(neworder = cumsum(month == 7)) %>% 
    
  # keep only complete years
  filter(n() == 12) %>% 
    
  # give new names to groups
  mutate(years = paste(unique(year), collapse = " / ")) %>% 
  ungroup() %>% 
  
  # reorder months
  mutate(month = factor(month, levels = months_order, labels = month.abb[months_order], ordered = TRUE)) %>% 
      
  # plot
  ggplot() +
  geom_line(aes(x = month, y = T42, colour = years, group = years)) +
  labs(title = "Average Temperature by Month", colour = "Year")


编辑

要从 7 月开始有类似于第一个情节的内容,您可以使用以下代码:

# libraries
library(ggplot2)
library(dplyr)
library(lubridate)


# custom months order
months_order <- c(7:12,1:6)

# fake dates for plot
# note: choose 4 to include 29 Feb which exist only in leap years
dates <- make_datetime(c(rep(3,6), rep(4,6)), months_order)

# line chart by datetime
df %>%
  
  # create date time
  mutate(datetime = make_datetime(year, month, day, hour, minute, second)) %>%
  
  # filter years of interest
  filter(datetime >= make_datetime(2018,7), datetime < make_datetime(2020,7)) %>%
  
  # create increasing group after each july
  group_by(year, month) %>%
  mutate(dummy = month(datetime) == 7 & datetime == min(datetime)) %>%
  ungroup() %>%
  mutate(dummy = cumsum(dummy)) %>%
  
  # force unique years and create custom name
  group_by(dummy) %>%
  mutate(datetime = datetime - years(year - 4) - years(month>=7),
         years = paste(unique(year), collapse = " / ")) %>%
  ungroup() %>%
  
  # plot
  ggplot() +
  geom_line(aes(x = datetime, y = T42, colour = years)) +
  scale_x_datetime(breaks = dates, labels = month.abb[months_order]) +
  labs(title = "Temperature by Datetime", colour = "Year")

要对月份进行不同排序并总结几年的值,您必须在绘制数据之前对数据进行一些处理:

library(dplyr)     # work data
library(ggplot2)   # plots
library(lubridate) # date
library(readr)     # fetch data

# your data
df <- read_csv2("https://raw.githubusercontent.com/gonzalodqa/timeseries/main/temp.csv")


  df %>%
  mutate(date = make_date(year, month,day)) %>%
  # reorder month
  group_by(month_2 = factor(as.character(month(date, label = T, locale = Sys.setlocale("LC_TIME", "English"))),
                            levels = c('Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb','Mar','Apr','May','Jun')),
           # group years as you like
           year_2   = ifelse( year(date) %in% (2018:2019), '2018/2019', '2020/2021')) %>%
  # you can put whatever aggregation function you need
  summarise(val = mean(T42, na.rm = T)) %>%
  # plot it!
  ggplot(aes(x = month_2, y = val, color = year_2, group = year_2)) + 
  geom_line()   + 
  ylab('T42')   +
  xlab('month') + 
  theme_light()