根据另一个变量 r 中值的变化生成一个新变量
Generate a new variable based on values change in another variable r
我问了一些非常相似的问题 [在此处输入 link 描述][1] 但我现在对我的问题有了更好的理解。我会尽量问清楚的。
我有一个示例数据集如下所示:
id <- c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8, 9,9, 10,10)
item.id <- c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2, 1,1, 1,1)
sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1, 1,2, 1,2)
score <- c(0,0,0, 0,0,1, 2,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,2, 1,2, 2,1)
data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score)
> data
id item.id sequence score
1 1 1 1 0
2 1 1 2 0
3 1 2 1 0
4 2 1 1 0
5 2 1 2 0
6 2 1 3 1
7 3 1 1 2
8 3 1 2 0
9 4 1 1 1
10 4 2 1 1
11 5 1 1 1
12 5 2 1 0
13 5 2 2 1
14 5 2 3 1
15 6 1 1 0
16 6 1 2 0
17 6 1 3 0
18 7 1 1 1
19 8 1 1 0
20 8 2 1 2
21 9 1 1 1
22 9 1 2 2
23 10 1 1 2
24 10 1 2 1
id
代表每个学生,item.id
代表学生做题,sequence
是每个item.id
的尝试次数,score
是每次尝试的分数,取 0,1 或 2。学生可以更改他们的答案。
对于每个 id
中的 item.id
,我通过查看最后两个序列(更改)创建了一个变量 (status
):这里的重新编码规则是针对 status
:
1-If there is only one attempt for each question:
a) assign "BTW" (Blank to Wrong) if the item score is 0.
b) assign "BTW" (Blank to Right) if the item score is 1.
2-If there are multiple attempts for each question:
a) assign "BTW" (Blank to Wrong) if the first item attempt score is 0.
b) assign "BTW" (Blank to Right) if the first item attempt score is 1.
c) assign "WW" for those who changed from wrong to wrong (0 to 0),
d) assign "WR" for those who changed to increasing score (0 to 1, or 1 to 2),
e) assign "RW" for those who changed to decreasing score (2 to 1, 2 to 0, or 1 to 0 ), and
f) assign "RR" for those who changed from right to right (1 to 1, 2 to 2).
分数从 0 到 1 或 0 到 2 或 1 到 2 的变化被认为是正确的(正确的)变化,同时,
分数从 1 到 0 或 2 到 0 或 2 到 1 的变化被认为是不正确的(错误的)变化。
如果 item.id
只有一次尝试 id
=7
,那么 status
应该是 "BTR"
。如果 score
是 0
,那么它应该是 "BTW"
。逻辑应该是如果分数增加,就应该是WR,如果分数减少,就应该是RW。
a) from 1 to 2 as WR, instead, they were coded as RR,
b) from 2 to 1 as RW, instead, they were coded as WW.
我使用了这个代码。有些事情没有解决,例如 id=1。状态应该是 {BTW, WW}。
library(dplyr)
data %>% group_by(id,item.id) %>%
mutate(diff = c(0, diff(score)),
status = case_when(
n() == 1 & score == 0 ~ "BTW",
n() == 1 & score == 1 ~ "BTR",
diff == 0 & score == 0 ~ "WW",
diff == 0 & score > 0 ~ "RR",
diff > 0 ~ "WR",
diff < 0 ~ "RW",
TRUE ~ "oops"))
> data
id item.id sequence score diff status
1 1 1 1 0 0 WW
2 1 1 2 0 0 WW
3 1 2 1 0 0 BTW
4 2 1 1 0 0 WW
5 2 1 2 0 0 WW
6 2 1 3 1 1 WR
7 3 1 1 2 0 RR
8 3 1 2 0 -2 RW
9 4 1 1 1 0 BTR
10 4 2 1 1 0 BTR
11 5 1 1 1 0 BTR
12 5 2 1 0 0 WW
13 5 2 2 1 1 WR
14 5 2 3 1 0 RR
15 6 1 1 0 0 WW
16 6 1 2 0 0 WW
17 6 1 3 0 0 WW
18 7 1 1 1 0 BTR
19 8 1 1 0 0 BTW
20 8 2 1 2 0 RR
21 9 1 1 1 0 RR
22 9 1 2 2 1 WR
23 10 1 1 2 0 RR
24 10 1 2 1 -1 RW
所需的输出将包含案例:
> desired
id item.id sequence score status
1 1 1 1 0 BTW
2 1 1 2 0 WW
3 1 2 1 0 BTW
4 2 1 1 0 BTW
5 2 1 2 0 WW
6 2 1 3 1 WR
7 3 1 1 2 BTR
8 3 1 2 0 RW
9 4 1 1 1 BTR
10 4 2 1 1 BTR
11 5 1 1 1 BTR
12 5 2 1 0 BTW
13 5 2 2 1 WR
14 5 2 3 1 RR
15 6 1 1 0 BTW
16 6 1 2 0 WW
17 6 1 3 0 WW
18 7 1 1 1 BTR
19 8 1 1 0 BTW
20 8 2 1 2 BTR
21 9 1 1 1 BTR
22 9 1 2 2 RR
23 10 1 1 2 BTR
24 10 1 2 1 RW
有什么意见吗?
谢谢!
为了解决这个问题,我把问题分成了两步。首先确定 Blank to answer 行。然后,一旦确定了第一次尝试,然后将答案的更改分配给剩余的行。
#rows that are not the first answer are assigned a "NA"
test<-data %>% group_by(id,item.id) %>%
mutate(status = case_when(
sequence == 1 & score == 0 ~ "BTW",
sequence == 1 & score >0 ~ "BTR",
TRUE ~ "NA"))
answer<- test %>% ungroup() %>% group_by(id, item.id) %>%
transmute(sequence, score,
status = case_when(score == 0 & score==lag(score) & status=="NA" ~ "WW",
score >= 1 & score == lag(score) & status=="NA"~ "RR",
score > 0 & score > lag(score) & status=="NA"~ "WR",
score < lag(score) & status=="NA"~ "RW",
TRUE ~ status))
head(answer, 20)
tail(answer, 4)
除第 20 行外,所有行的状态列都与您的示例数据匹配,请仔细检查计算结果。
我问了一些非常相似的问题 [在此处输入 link 描述][1] 但我现在对我的问题有了更好的理解。我会尽量问清楚的。
我有一个示例数据集如下所示:
id <- c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8, 9,9, 10,10)
item.id <- c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2, 1,1, 1,1)
sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1, 1,2, 1,2)
score <- c(0,0,0, 0,0,1, 2,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,2, 1,2, 2,1)
data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score)
> data
id item.id sequence score
1 1 1 1 0
2 1 1 2 0
3 1 2 1 0
4 2 1 1 0
5 2 1 2 0
6 2 1 3 1
7 3 1 1 2
8 3 1 2 0
9 4 1 1 1
10 4 2 1 1
11 5 1 1 1
12 5 2 1 0
13 5 2 2 1
14 5 2 3 1
15 6 1 1 0
16 6 1 2 0
17 6 1 3 0
18 7 1 1 1
19 8 1 1 0
20 8 2 1 2
21 9 1 1 1
22 9 1 2 2
23 10 1 1 2
24 10 1 2 1
id
代表每个学生,item.id
代表学生做题,sequence
是每个item.id
的尝试次数,score
是每次尝试的分数,取 0,1 或 2。学生可以更改他们的答案。
对于每个 id
中的 item.id
,我通过查看最后两个序列(更改)创建了一个变量 (status
):这里的重新编码规则是针对 status
:
1-If there is only one attempt for each question:
a) assign "BTW" (Blank to Wrong) if the item score is 0.
b) assign "BTW" (Blank to Right) if the item score is 1.
2-If there are multiple attempts for each question:
a) assign "BTW" (Blank to Wrong) if the first item attempt score is 0.
b) assign "BTW" (Blank to Right) if the first item attempt score is 1.
c) assign "WW" for those who changed from wrong to wrong (0 to 0),
d) assign "WR" for those who changed to increasing score (0 to 1, or 1 to 2),
e) assign "RW" for those who changed to decreasing score (2 to 1, 2 to 0, or 1 to 0 ), and
f) assign "RR" for those who changed from right to right (1 to 1, 2 to 2).
分数从 0 到 1 或 0 到 2 或 1 到 2 的变化被认为是正确的(正确的)变化,同时, 分数从 1 到 0 或 2 到 0 或 2 到 1 的变化被认为是不正确的(错误的)变化。
如果 item.id
只有一次尝试 id
=7
,那么 status
应该是 "BTR"
。如果 score
是 0
,那么它应该是 "BTW"
。逻辑应该是如果分数增加,就应该是WR,如果分数减少,就应该是RW。
a) from 1 to 2 as WR, instead, they were coded as RR,
b) from 2 to 1 as RW, instead, they were coded as WW.
我使用了这个代码。有些事情没有解决,例如 id=1。状态应该是 {BTW, WW}。
library(dplyr)
data %>% group_by(id,item.id) %>%
mutate(diff = c(0, diff(score)),
status = case_when(
n() == 1 & score == 0 ~ "BTW",
n() == 1 & score == 1 ~ "BTR",
diff == 0 & score == 0 ~ "WW",
diff == 0 & score > 0 ~ "RR",
diff > 0 ~ "WR",
diff < 0 ~ "RW",
TRUE ~ "oops"))
> data
id item.id sequence score diff status
1 1 1 1 0 0 WW
2 1 1 2 0 0 WW
3 1 2 1 0 0 BTW
4 2 1 1 0 0 WW
5 2 1 2 0 0 WW
6 2 1 3 1 1 WR
7 3 1 1 2 0 RR
8 3 1 2 0 -2 RW
9 4 1 1 1 0 BTR
10 4 2 1 1 0 BTR
11 5 1 1 1 0 BTR
12 5 2 1 0 0 WW
13 5 2 2 1 1 WR
14 5 2 3 1 0 RR
15 6 1 1 0 0 WW
16 6 1 2 0 0 WW
17 6 1 3 0 0 WW
18 7 1 1 1 0 BTR
19 8 1 1 0 0 BTW
20 8 2 1 2 0 RR
21 9 1 1 1 0 RR
22 9 1 2 2 1 WR
23 10 1 1 2 0 RR
24 10 1 2 1 -1 RW
所需的输出将包含案例:
> desired
id item.id sequence score status
1 1 1 1 0 BTW
2 1 1 2 0 WW
3 1 2 1 0 BTW
4 2 1 1 0 BTW
5 2 1 2 0 WW
6 2 1 3 1 WR
7 3 1 1 2 BTR
8 3 1 2 0 RW
9 4 1 1 1 BTR
10 4 2 1 1 BTR
11 5 1 1 1 BTR
12 5 2 1 0 BTW
13 5 2 2 1 WR
14 5 2 3 1 RR
15 6 1 1 0 BTW
16 6 1 2 0 WW
17 6 1 3 0 WW
18 7 1 1 1 BTR
19 8 1 1 0 BTW
20 8 2 1 2 BTR
21 9 1 1 1 BTR
22 9 1 2 2 RR
23 10 1 1 2 BTR
24 10 1 2 1 RW
有什么意见吗? 谢谢!
为了解决这个问题,我把问题分成了两步。首先确定 Blank to answer 行。然后,一旦确定了第一次尝试,然后将答案的更改分配给剩余的行。
#rows that are not the first answer are assigned a "NA"
test<-data %>% group_by(id,item.id) %>%
mutate(status = case_when(
sequence == 1 & score == 0 ~ "BTW",
sequence == 1 & score >0 ~ "BTR",
TRUE ~ "NA"))
answer<- test %>% ungroup() %>% group_by(id, item.id) %>%
transmute(sequence, score,
status = case_when(score == 0 & score==lag(score) & status=="NA" ~ "WW",
score >= 1 & score == lag(score) & status=="NA"~ "RR",
score > 0 & score > lag(score) & status=="NA"~ "WR",
score < lag(score) & status=="NA"~ "RW",
TRUE ~ status))
head(answer, 20)
tail(answer, 4)
除第 20 行外,所有行的状态列都与您的示例数据匹配,请仔细检查计算结果。