连接列中的特定字符串

concatenate specific character strings in columns

我有这样一个数据框:

df <- data.frame("region" = c("Spain", "Barcelona", "Madrid",
                          "France", "Paris", "Lyon", 
                          "Belgium", "Bruges", "Brussels"), 
             "2010" = 1:9, "2011" = c(NA, 1, 2, NA, 3, 4, NA, 5, 6))

我想连接国家名称和城市名称。所有国家名称的行都有 NA,每个城市名称都在国家名称之后。

我想要的数据框是这样的:

desired_df <- data.frame("region" = c("Spain_Spain", "Spain_Barcelona", "Spain_Madrid",
                          "France_France", "France_Paris", "France_Lyon",
                          "Belgium_Belgium", "Belgium_Bruges", "Belgium_Brussels"), 
             "2010" = 1:9, "2011" = c(NA, 1, 2, NA, 3, 4, NA, 5, 6))

如果缺少 country_country 行也没关系。 任何帮助将不胜感激。

我们可以创建一个基于国家名称出现的分组变量,paste 'region' 的 first 元素与 'region' 的其他元素一起更新'region' 列

library(dplyr)
library(stringr)
df %>%
   group_by(grp = cumsum(region %in% c("Spain", "France", "Belgium"))) %>%
   mutate(region = str_c(first(region), region, sep="_")) %>%
   ungroup %>% 
   select(-grp)
# A tibble: 9 x 3
#  region           X2010 X2011
#  <chr>            <int> <dbl>
#1 Spain_Spain          1    NA
#2 Spain_Barcelona      2     1
#3 Spain_Madrid         3     2
#4 France_France        4    NA
#5 France_Paris         5     3
#6 France_Lyon          6     4
#7 Belgium_Belgium      7    NA
#8 Belgium_Bruges       8     5
#9 Belgium_Brussels     9     6

或者如@akash87 所述,如果模式应基于 'X2011'

df %>%
   group_by(grp = cumsum(is.na(X2011))) %>%
   mutate(region = str_c(first(region), region, sep="_")) %>%
   ungroup %>% 
   select(-grp)

使用 tidyverse 的通用解决方案需要从其他数据中过滤掉国家/地区并将数据重新加入:

df %>% 
mutate(gr = cumsum(is.na(X2011))) %>% 
filter(!is.na(X2011)) %>% 
left_join(countries %>% 
          select(region, gr) %>% 
          rename("country" = "region"), by = "gr") %>% 
mutate(new_region = paste(country,region, sep = "_")) %>% 
select(-gr)
library(dplyr)
library(tidyr)
df %>% 
  mutate(country = if_else(is.na(X2011), region, NULL)) %>% 
  fill(country) %>% 
  unite("region", c(country,region))