清除 HTML table,从下一行值添加列,然后删除该行
Clean HTML table, add column from next row value, then delete that row
我已将带有 rvest 的 HTML table 抓取到数据框中,但我需要对其进行清理以满足我的需要。我不确定我是否应该将此作为抓取的一部分进行,或者作为数据操作的一部分进行清理。
我需要的是在第一行添加一列,其中包含第二行的值。然后完全删除第二行。如果有意义,对每个 odd/even 行重复。
这是刮擦的样子:
n = c("Player 1", "Male", "Player 2", "Female")
s = c(1, "Male", 5, "Female")
b = c(1, "Male", 5, "Female")
df1 = data.frame(n, s, b)
这是我想要的样子:
n = c("Player 1", "Player 2")
s = c(1, 5)
b = c(1, 5)
v = c("Male", "Female")
df1 = data.frame(n, s, b, v)
试试这个
# First, split the dataframe by odd/even rows.
# Then, cbind odd (TRUE) and even (FALSE) rows;
# only need the first column of every even row because all columns have the same value.
with(
split(df1, seq_len(nrow(df1)) %% 2L == 1L),
as.data.frame(cbind(`TRUE`, v = `FALSE`[[1L]]))
)
输出
n s b v
1 Player 1 1 1 Male
3 Player 2 5 5 Female
这个有用吗:
> library(dplyr)
> df1 %>% mutate(v = lead(b)) %>% filter(row_number() %in% seq(1,nrow(df1), 2))
n s b v
1 Player 1 1 1 Male
2 Player 2 5 5 Female
我已将带有 rvest 的 HTML table 抓取到数据框中,但我需要对其进行清理以满足我的需要。我不确定我是否应该将此作为抓取的一部分进行,或者作为数据操作的一部分进行清理。
我需要的是在第一行添加一列,其中包含第二行的值。然后完全删除第二行。如果有意义,对每个 odd/even 行重复。
这是刮擦的样子:
n = c("Player 1", "Male", "Player 2", "Female")
s = c(1, "Male", 5, "Female")
b = c(1, "Male", 5, "Female")
df1 = data.frame(n, s, b)
这是我想要的样子:
n = c("Player 1", "Player 2")
s = c(1, 5)
b = c(1, 5)
v = c("Male", "Female")
df1 = data.frame(n, s, b, v)
试试这个
# First, split the dataframe by odd/even rows.
# Then, cbind odd (TRUE) and even (FALSE) rows;
# only need the first column of every even row because all columns have the same value.
with(
split(df1, seq_len(nrow(df1)) %% 2L == 1L),
as.data.frame(cbind(`TRUE`, v = `FALSE`[[1L]]))
)
输出
n s b v
1 Player 1 1 1 Male
3 Player 2 5 5 Female
这个有用吗:
> library(dplyr)
> df1 %>% mutate(v = lead(b)) %>% filter(row_number() %in% seq(1,nrow(df1), 2))
n s b v
1 Player 1 1 1 Male
2 Player 2 5 5 Female