将行(不同名称)的值放入列中 - 复杂的传播?
Make values of row (different names) into column -- complicated spread?
我有这个数据框,其中每个问题都有两个子部分(q1,q1_p2)。我想将第二个子部分的答案移到同一行。
question answer
q1 bleh
q1_p2 bah
q2 meh
q2_p2 bleh
基本上是这样的。
question answer p2
q1 bleh bah
q2 meh bleh
对于这样的事情,我通常会使用 spread 之类的东西,但我不知道如何将要传播的每个问题的价值都不相同这一事实结合起来。
有什么想法吗?
好吧,这不像我希望的那样整洁,但它有效。
library(data.table)
dt = data.table("question" = c("q1", "q1_p2", "q2", "q2_p2"), "answer" = c("bleh","bah","meh","bleh"))
dt$q = sapply(dt$question ,function(x) strsplit(x, "_")[[1]][1])
dt[ , "Row" := 1:.N]
dt[ , "New" := ifelse(nchar(gsub("\D","",question)) == 1, "answer", gsub("(.+(?=p\d+))", "",question, perl = T)), by = .(Row)]
dt = dcast(dt, q ~ New, value.var = "answer")
> dt
q answer p2
1: q1 bleh bah
2: q2 meh bleh
如果您的完整数据集遵循示例的结构,那么这就足够了,
library(dplyr)
library(tidyr)
df %>%
group_by(question = sub('_.*', '', question)) %>%
mutate(new = seq(n())) %>%
spread(new, answer) %>%
rename(answer = `1`, p2 = `2`) %>%
ungroup()
# A tibble: 2 × 3
# question answer p2
#* <chr> <fctr> <fctr>
#1 q1 bleh bah
#2 q2 meh bleh
这是另一个选项 tidyverse
library(tidyverse)
separate(df1, question, into = c("question", "value")) %>%
mutate(value = replace(value, is.na(value), "answer")) %>%
spread(value, answer)
# question answer p2
#1 q1 bleh bah
#2 q2 meh bleh
我有这个数据框,其中每个问题都有两个子部分(q1,q1_p2)。我想将第二个子部分的答案移到同一行。
question answer
q1 bleh
q1_p2 bah
q2 meh
q2_p2 bleh
基本上是这样的。
question answer p2
q1 bleh bah
q2 meh bleh
对于这样的事情,我通常会使用 spread 之类的东西,但我不知道如何将要传播的每个问题的价值都不相同这一事实结合起来。
有什么想法吗?
好吧,这不像我希望的那样整洁,但它有效。
library(data.table)
dt = data.table("question" = c("q1", "q1_p2", "q2", "q2_p2"), "answer" = c("bleh","bah","meh","bleh"))
dt$q = sapply(dt$question ,function(x) strsplit(x, "_")[[1]][1])
dt[ , "Row" := 1:.N]
dt[ , "New" := ifelse(nchar(gsub("\D","",question)) == 1, "answer", gsub("(.+(?=p\d+))", "",question, perl = T)), by = .(Row)]
dt = dcast(dt, q ~ New, value.var = "answer")
> dt
q answer p2
1: q1 bleh bah
2: q2 meh bleh
如果您的完整数据集遵循示例的结构,那么这就足够了,
library(dplyr)
library(tidyr)
df %>%
group_by(question = sub('_.*', '', question)) %>%
mutate(new = seq(n())) %>%
spread(new, answer) %>%
rename(answer = `1`, p2 = `2`) %>%
ungroup()
# A tibble: 2 × 3
# question answer p2
#* <chr> <fctr> <fctr>
#1 q1 bleh bah
#2 q2 meh bleh
这是另一个选项 tidyverse
library(tidyverse)
separate(df1, question, into = c("question", "value")) %>%
mutate(value = replace(value, is.na(value), "answer")) %>%
spread(value, answer)
# question answer p2
#1 q1 bleh bah
#2 q2 meh bleh