将一个因子列转换为 r 中的多个二分列
convert one factor column to multiple dichotomous columns in r
我有一个包含 PatientID 及其诊断的数据集,它们如下:
Id Diagnoses
1 Nerve conditions (e.g., Multiple sclerosis, myasthenia gravis, Guillain-Barre syndrome, demyelinating polyneuropathy)
2 Gastrointestinal conditions (e.g., irritable bowl disease, ulcerative colitis, Chron's disease),Heart conditions,High blood pressure,Migraines/headaches
3 Heart conditions,Traumatic brain injury
4 Chronic pain,Heart conditions,Post-traumatic Stress Disorder (PTSD),Traumatic brain injury
5 Anxiety,Chronic pain,Depression,Sleep apnea
6 High blood pressure
7 High blood pressure
如何拆分 Diagnoses
列,如下所示:
Id Anxiety Depression Nerve conditions Sleep apnea Chronic Diseases AND SO ON....
1 0 0 0 1 1
2 1 1 1 1 1
3 1 1 1 1 0
4 0 0 1 1 1
5 1 0 0 0 1
6 1 1 1 0 1
7 1 1 0 1 0
我试过这段代码,但没有得到结果:
df %>%
separate_rows(Diagnoses, sep=",") %>%
separate(Q2.3, into = c("Anxiety", "Depression, "THE REST OF CONDITIONS"), sep=":\s*") %>%
mutate(anxiety1 = str_c("Anxiety", Anxiety))
感谢您的帮助。,
这个有用吗:
library(stringr)
library(dplyr)
library(tidyr)
df %>% mutate(Diagnoses = str_remove(Diagnoses, ' \(.*\)?')) %>%
separate_rows(Diagnoses, sep = ',') %>% count(Id, Diagnoses, name = 'Cnt') %>%
pivot_wider(id_cols = Id, names_from = Diagnoses, values_from = Cnt, values_fill = list(Cnt = 0))
# A tibble: 7 x 11
Id `Nerve condition~ `Gastrointestina~ `Heart conditio~ `Traumatic brai~ `Chronic pain` `Post-traumatic ~ Anxiety Depression `Sleep apnea` `High blood pre~
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 0 0 0 0 0 0 0 0 0
2 2 0 1 0 0 0 0 0 0 0 0
3 3 0 0 1 1 0 0 0 0 0 0
4 4 0 0 1 0 1 1 0 0 0 0
5 5 0 0 0 0 1 0 1 1 1 0
6 6 0 0 0 0 0 0 0 0 0 1
7 7 0 0 0 0 0 0 0 0 0 1
>
我有一个包含 PatientID 及其诊断的数据集,它们如下:
Id Diagnoses
1 Nerve conditions (e.g., Multiple sclerosis, myasthenia gravis, Guillain-Barre syndrome, demyelinating polyneuropathy)
2 Gastrointestinal conditions (e.g., irritable bowl disease, ulcerative colitis, Chron's disease),Heart conditions,High blood pressure,Migraines/headaches
3 Heart conditions,Traumatic brain injury
4 Chronic pain,Heart conditions,Post-traumatic Stress Disorder (PTSD),Traumatic brain injury
5 Anxiety,Chronic pain,Depression,Sleep apnea
6 High blood pressure
7 High blood pressure
如何拆分 Diagnoses
列,如下所示:
Id Anxiety Depression Nerve conditions Sleep apnea Chronic Diseases AND SO ON....
1 0 0 0 1 1
2 1 1 1 1 1
3 1 1 1 1 0
4 0 0 1 1 1
5 1 0 0 0 1
6 1 1 1 0 1
7 1 1 0 1 0
我试过这段代码,但没有得到结果:
df %>%
separate_rows(Diagnoses, sep=",") %>%
separate(Q2.3, into = c("Anxiety", "Depression, "THE REST OF CONDITIONS"), sep=":\s*") %>%
mutate(anxiety1 = str_c("Anxiety", Anxiety))
感谢您的帮助。,
这个有用吗:
library(stringr)
library(dplyr)
library(tidyr)
df %>% mutate(Diagnoses = str_remove(Diagnoses, ' \(.*\)?')) %>%
separate_rows(Diagnoses, sep = ',') %>% count(Id, Diagnoses, name = 'Cnt') %>%
pivot_wider(id_cols = Id, names_from = Diagnoses, values_from = Cnt, values_fill = list(Cnt = 0))
# A tibble: 7 x 11
Id `Nerve condition~ `Gastrointestina~ `Heart conditio~ `Traumatic brai~ `Chronic pain` `Post-traumatic ~ Anxiety Depression `Sleep apnea` `High blood pre~
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 0 0 0 0 0 0 0 0 0
2 2 0 1 0 0 0 0 0 0 0 0
3 3 0 0 1 1 0 0 0 0 0 0
4 4 0 0 1 0 1 1 0 0 0 0
5 5 0 0 0 0 1 0 1 1 1 0
6 6 0 0 0 0 0 0 0 0 0 1
7 7 0 0 0 0 0 0 0 0 0 1
>