合并交替行
Merge Alternating Rows
我从 pdf 中提取了数据,但遇到了一个小问题。一些 单元格被分成两行 ,因为数据在某些单元格中具有 NA,而在其他单元格中具有值。这些值我想简单地 合并到上面的单元格中 .
有趣的是,我想合并其他行的每一行“真实”行都以相同的符号开头,即“§”。
我有大约 1000 个观察结果,所以自动化解决方案会很棒
first <- c("§", "3", "4")
second <- c(NA, "2", NA)
third <- c("§", "2", 5)
fourth <- c(NA, "2", "3")
... and so on
df <- as.data.frame(rbind(first, second, third, fourth))
expected output:
first_e <- c("§", "32", "4")
second_e <- c "§", "22", "53")
df_e <- as.data.frame(rbind(first_e, second_e))
如果有人有想法就太棒了(:
柏林最佳
如果总是有第二行,一个可能的解决方案是:
library(dplyr)
图书馆(dplyr)
df %>%
# use the row number as colum
dplyr::mutate(ID = dplyr::row_number()) %>%
# substract 1 from very even row numer to build groups
dplyr::mutate(ID = ifelse(ID %% 2 == 0, ID - 1, ID)) %>%
# group by the new ID
dplyr::group_by(ID) %>%
# convert all NAs to "" (empty string)
dplyr::mutate_all(~ ifelse(is.na(.), "", .)) %>%
# concatenate all strings per group
dplyr::mutate_all( ~ paste(., collapse = "")) %>%
# select only distinct cases (do elimitate "seconds" as the now are identical to "frists)
dplyr::distinct()
V1 V2 V3 ID
<chr> <chr> <chr> <dbl>
1 4 32 4 1
我在结果中留下了创建的 ID 号,但如果您愿意,可以在计算后 drop/delete 它
只需将一列的奇数元素粘贴到偶数元素即可:
# vectors of TRUEs in odd or even positions
odd <- rep(c(T,F), length.out=nrow(df))
evn <- rep(c(F,T), length.out=nrow(df))
# for each column...
result <- lapply(df, function(col) {
paste0(ifelse(is.na(col[odd]), '', col[odd]),
ifelse(is.na(col[evn]), '', col[evn]))
})
as.data.frame(result)
考虑使用 ifelse
+ cumsum
标记 §
列,以使用 paste
:
为 aggregate
生成分组字段
# BUILD DATA FRAME
df <- setNames(rbind.data.frame(first, second, third, fourth, stringsAsFactors=FALSE),
c("col1", "col2", "col3"))
# CONVERT ALL NAs TO EMPTY STRING
df[is.na(df)] <- ""
# GENERATE GROUPING COLUMN
df$section <- cumsum(ifelse(df$col1 == "§", 1, 0))
df
# col1 col2 col3 section
# 1 § 3 4 1
# 2 2 1
# 3 § 2 5 2
# 4 2 3 2
# AGGREGATE BY GROUPING COLUMNS
clean_df <- aggregate(. ~ section, df, paste, collapse="")[-1]
clean_df
# col1 col2 col3
# 1 § 32 4
# 2 § 22 53
我从 pdf 中提取了数据,但遇到了一个小问题。一些 单元格被分成两行 ,因为数据在某些单元格中具有 NA,而在其他单元格中具有值。这些值我想简单地 合并到上面的单元格中 .
有趣的是,我想合并其他行的每一行“真实”行都以相同的符号开头,即“§”。
我有大约 1000 个观察结果,所以自动化解决方案会很棒
first <- c("§", "3", "4")
second <- c(NA, "2", NA)
third <- c("§", "2", 5)
fourth <- c(NA, "2", "3")
... and so on
df <- as.data.frame(rbind(first, second, third, fourth))
expected output:
first_e <- c("§", "32", "4")
second_e <- c "§", "22", "53")
df_e <- as.data.frame(rbind(first_e, second_e))
如果有人有想法就太棒了(:
柏林最佳
如果总是有第二行,一个可能的解决方案是:
library(dplyr)
图书馆(dplyr)
df %>%
# use the row number as colum
dplyr::mutate(ID = dplyr::row_number()) %>%
# substract 1 from very even row numer to build groups
dplyr::mutate(ID = ifelse(ID %% 2 == 0, ID - 1, ID)) %>%
# group by the new ID
dplyr::group_by(ID) %>%
# convert all NAs to "" (empty string)
dplyr::mutate_all(~ ifelse(is.na(.), "", .)) %>%
# concatenate all strings per group
dplyr::mutate_all( ~ paste(., collapse = "")) %>%
# select only distinct cases (do elimitate "seconds" as the now are identical to "frists)
dplyr::distinct()
V1 V2 V3 ID
<chr> <chr> <chr> <dbl>
1 4 32 4 1
我在结果中留下了创建的 ID 号,但如果您愿意,可以在计算后 drop/delete 它
只需将一列的奇数元素粘贴到偶数元素即可:
# vectors of TRUEs in odd or even positions
odd <- rep(c(T,F), length.out=nrow(df))
evn <- rep(c(F,T), length.out=nrow(df))
# for each column...
result <- lapply(df, function(col) {
paste0(ifelse(is.na(col[odd]), '', col[odd]),
ifelse(is.na(col[evn]), '', col[evn]))
})
as.data.frame(result)
考虑使用 ifelse
+ cumsum
标记 §
列,以使用 paste
:
aggregate
生成分组字段
# BUILD DATA FRAME
df <- setNames(rbind.data.frame(first, second, third, fourth, stringsAsFactors=FALSE),
c("col1", "col2", "col3"))
# CONVERT ALL NAs TO EMPTY STRING
df[is.na(df)] <- ""
# GENERATE GROUPING COLUMN
df$section <- cumsum(ifelse(df$col1 == "§", 1, 0))
df
# col1 col2 col3 section
# 1 § 3 4 1
# 2 2 1
# 3 § 2 5 2
# 4 2 3 2
# AGGREGATE BY GROUPING COLUMNS
clean_df <- aggregate(. ~ section, df, paste, collapse="")[-1]
clean_df
# col1 col2 col3
# 1 § 32 4
# 2 § 22 53