将紧邻的下一行合并在一起 - R
Merge immediate next rows together - R
这是我的 df 的样子,
Region Dummy value1 value2
Mangonui NA NA NA
Sales NA 9 6
Kaitaia NA NA NA
Sales NA 16 1
Whangaroa NA NA NA
Sales NA 2 2
重新生成这个的步骤,
structure(list(Region = c("Mangonui", "Sales", "Kaitaia",
"Sales", "Whangaroa", "Sales"), Dummy = c(NA,
NA, NA, NA, NA, NA), Dweling_values = c(NA, "9", NA, "16", NA,
"2"), Section_values = c(NA, "6", NA, "1", NA, "2")), .Names = c("Region",
"Dummy", "value1", "value2"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
如何将两行合并在一起,以便根据地区名称获得销售额?所以输出应该是这样的,
Region Dummy value1 value2
Mangonui NA 9 6
Kaitaia NA 16 1
Whangaroa NA 2 2
base R
中的简单重新排列
myNew <- cbind(df$Region[seq.int(1,nrow(df),2)], df[seq.int(2,nrow(df),2), 2:4])
names(myNew) <- names(df)
myNew
Region Dummy value1 value2
2 Mangonui NA 9 6
4 Kaitaia NA 16 1
6 Whangaroa NA 2 2
更新
到目前为止,最优雅的解决方案是由@thelatemail
给出的
cbind(df[1][c(TRUE,FALSE),,drop=FALSE], df[-1][c(FALSE,TRUE),])
Region Dummy value1 value2
1 Mangonui NA 9 6
3 Kaitaia NA 16 1
5 Whangaroa NA 2 2
首先你要把这个表格里给你数据的人找出来骂一顿。告诉他们,如果他们继续这样做,你就不会和他们成为朋友。然后,只需使用一些简单的基本 R 函数:
# generate indices for the sales and region rows
sales_rows <- seq(2, nrow(df), by = 2)
region_rows <- seq(1, nrow(df), by = 2)
# subset to create the df you really want
sales_df <- df[sales_rows, ]
# use just the names from the region rows
regions <- df[region_rows, "Region"]
sales_df$Region <- regions
# > sales_df
# Region Dummy value1 value2
# 2 Mangonui NA 9 6
# 4 Kaitaia NA 16 1
# 6 Whangaroa NA 2 2
使用 dplyr
和 tidyr
的解决方案。这个想法是使用 recode
将 Sales
替换为 NA
,使用 fill
根据前面的行估算那些 NA
,然后使用 filter_at
以过滤其他列中具有任何非 NA 值的行。
library(dplyr)
library(tidyr)
dt2 <- dt %>%
mutate(Region = recode(Region, `Sales` = NA_character_)) %>%
fill(Region) %>%
filter_at(vars(-Region), any_vars(!is.na(.)))
dt2
# # A tibble: 3 x 4
# Region Dummy value1 value2
# <chr> <lgl> <chr> <chr>
# 1 Mangonui NA 9 6
# 2 Kaitaia NA 16 1
# 3 Whangaroa NA 2 2
这是我的 df 的样子,
Region Dummy value1 value2
Mangonui NA NA NA
Sales NA 9 6
Kaitaia NA NA NA
Sales NA 16 1
Whangaroa NA NA NA
Sales NA 2 2
重新生成这个的步骤,
structure(list(Region = c("Mangonui", "Sales", "Kaitaia",
"Sales", "Whangaroa", "Sales"), Dummy = c(NA,
NA, NA, NA, NA, NA), Dweling_values = c(NA, "9", NA, "16", NA,
"2"), Section_values = c(NA, "6", NA, "1", NA, "2")), .Names = c("Region",
"Dummy", "value1", "value2"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
如何将两行合并在一起,以便根据地区名称获得销售额?所以输出应该是这样的,
Region Dummy value1 value2
Mangonui NA 9 6
Kaitaia NA 16 1
Whangaroa NA 2 2
base R
myNew <- cbind(df$Region[seq.int(1,nrow(df),2)], df[seq.int(2,nrow(df),2), 2:4])
names(myNew) <- names(df)
myNew
Region Dummy value1 value2
2 Mangonui NA 9 6
4 Kaitaia NA 16 1
6 Whangaroa NA 2 2
更新
到目前为止,最优雅的解决方案是由@thelatemail
cbind(df[1][c(TRUE,FALSE),,drop=FALSE], df[-1][c(FALSE,TRUE),])
Region Dummy value1 value2
1 Mangonui NA 9 6
3 Kaitaia NA 16 1
5 Whangaroa NA 2 2
首先你要把这个表格里给你数据的人找出来骂一顿。告诉他们,如果他们继续这样做,你就不会和他们成为朋友。然后,只需使用一些简单的基本 R 函数:
# generate indices for the sales and region rows
sales_rows <- seq(2, nrow(df), by = 2)
region_rows <- seq(1, nrow(df), by = 2)
# subset to create the df you really want
sales_df <- df[sales_rows, ]
# use just the names from the region rows
regions <- df[region_rows, "Region"]
sales_df$Region <- regions
# > sales_df
# Region Dummy value1 value2
# 2 Mangonui NA 9 6
# 4 Kaitaia NA 16 1
# 6 Whangaroa NA 2 2
使用 dplyr
和 tidyr
的解决方案。这个想法是使用 recode
将 Sales
替换为 NA
,使用 fill
根据前面的行估算那些 NA
,然后使用 filter_at
以过滤其他列中具有任何非 NA 值的行。
library(dplyr)
library(tidyr)
dt2 <- dt %>%
mutate(Region = recode(Region, `Sales` = NA_character_)) %>%
fill(Region) %>%
filter_at(vars(-Region), any_vars(!is.na(.)))
dt2
# # A tibble: 3 x 4
# Region Dummy value1 value2
# <chr> <lgl> <chr> <chr>
# 1 Mangonui NA 9 6
# 2 Kaitaia NA 16 1
# 3 Whangaroa NA 2 2