合并交替行

Merge Alternating Rows

我从 pdf 中提取了数据,但遇到了一个小问题。一些 单元格被分成两行 ,因为数据在某些单元格中具有 NA,而在其他单元格中具有值。这些值我想简单地 合并到上面的单元格中 .

有趣的是,我想合并其他行的每一行“真实”行都以相同的符号开头,即“§”。

我有大约 1000 个观察结果,所以自动化解决方案会很棒

first <- c("§", "3", "4")
second <- c(NA, "2", NA)
third <- c("§", "2", 5)
fourth <- c(NA, "2", "3")
... and so on

df <- as.data.frame(rbind(first, second, third, fourth))

expected output: 
first_e <- c("§", "32", "4")
second_e <- c "§", "22", "53")

df_e <- as.data.frame(rbind(first_e, second_e))

如果有人有想法就太棒了(:

柏林最佳

如果总是有第二行,一个可能的解决方案是:

library(dplyr)

图书馆(dplyr)

df %>% 
  # use the row number as colum
  dplyr::mutate(ID = dplyr::row_number()) %>% 
  # substract 1 from very even row numer to build groups
  dplyr::mutate(ID = ifelse(ID %% 2 == 0, ID - 1, ID)) %>% 
  # group by the new ID
  dplyr::group_by(ID) %>% 
  # convert all NAs to "" (empty string)
  dplyr::mutate_all(~ ifelse(is.na(.), "", .)) %>% 
  # concatenate all strings per group
  dplyr::mutate_all( ~ paste(., collapse = "")) %>% 
  # select only distinct cases (do elimitate "seconds" as the now are identical to "frists)
  dplyr::distinct()


  V1    V2    V3       ID
  <chr> <chr> <chr> <dbl>
1 4     32    4         1

我在结果中留下了创建的 ID 号,但如果您愿意,可以在计算后 drop/delete 它

只需将一列的奇数元素粘贴到偶数元素即可:

# vectors of TRUEs in odd or even positions
odd <- rep(c(T,F), length.out=nrow(df))
evn <- rep(c(F,T), length.out=nrow(df))

# for each column...
result <- lapply(df, function(col) {
    paste0(ifelse(is.na(col[odd]), '', col[odd]),
           ifelse(is.na(col[evn]), '', col[evn]))  
})
as.data.frame(result)

考虑使用 ifelse + cumsum 标记 § 列,以使用 paste:

aggregate 生成分组字段
# BUILD DATA FRAME
df <- setNames(rbind.data.frame(first, second, third, fourth, stringsAsFactors=FALSE),
               c("col1", "col2", "col3"))

# CONVERT ALL NAs TO EMPTY STRING
df[is.na(df)] <- ""

# GENERATE GROUPING COLUMN
df$section <- cumsum(ifelse(df$col1 == "§", 1, 0))
df
#   col1 col2 col3 section
# 1    §    3    4       1
# 2         2            1
# 3    §    2    5       2
# 4         2    3       2

# AGGREGATE BY GROUPING COLUMNS
clean_df <- aggregate(. ~ section, df, paste, collapse="")[-1]
clean_df
#   col1 col2 col3
# 1    §   32    4
# 2    §   22   53

Online Demo