为 r 中的数字变量中的缺失值创建虚拟对象
Create dummy for missing values in numeric variable in r
我有以下数据:
PassengerId Survived Pclass Sex Age SibSp Parch Fare Embarked
1 1 0 3 male 22 1 0 7.2500 S
2 2 1 1 female 38 1 0 71.2833 C
3 3 1 3 female 26 0 0 7.9250 S
4 4 1 1 female 35 1 0 53.1000 S
5 5 0 3 male 35 0 0 8.0500 S
6 6 0 3 male NA 0 0 8.4583 Q
现在,当我使用 dummy
或 dummy.data.frame
时,我可以像这样成功地将因子(此处为 Sex
和 Embarked
)转换为虚拟变量:
PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch Fare Embarked EmbarkedC EmbarkedQ EmbarkedS
1 1 0 3 0 1 22 1 0 7.2500 0 0 0 1
2 2 1 1 1 0 38 1 0 71.2833 0 1 0 0
3 3 1 3 1 0 26 0 0 7.9250 0 0 0 1
4 4 1 1 1 0 35 1 0 53.1000 0 0 0 1
5 5 0 3 0 1 35 0 0 8.0500 0 0 0 1
6 6 0 3 0 1 NA 0 0 8.4583 0 0 1 0
现在,如果我如何将它应用到 Age
列,它会创建 100 多个虚拟对象,一个用于每个唯一的年龄条目,一个用于 NA
。我希望输出像
Age Age.NA
22 0
38 0
......
35 0
0 1
它会自动将缺失值视为不同的条目,并在因子的情况下为其创建一个变量,但我想在数字变量的情况下实现相同的目的,而不妨碍列中已有的值。请帮忙。
使用 ifelse()
语句检查 NA
:
Age.NA <- ifelse(is.na(Age), 1, 0)
您可以只使用:
df$Age.NA <- ifelse(is.na(df$Age), 1, 0)
然后:
library(dummies)
dummy.data.frame(df)
输出:
PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch Fare EmbarkedC EmbarkedQ EmbarkedS Age.NA
1 1 0 3 0 1 22 1 0 7.2500 0 0 1 0
2 2 1 1 1 0 38 1 0 71.2833 1 0 0 0
3 3 1 3 1 0 26 0 0 7.9250 0 0 1 0
4 4 1 1 1 0 35 1 0 53.1000 0 0 1 0
5 5 0 3 0 1 35 0 0 8.0500 0 0 1 0
6 6 0 3 0 1 NA 0 0 8.4583 0 1 0 1
数据:
df <- structure(list(PassengerId = 1:6, Survived = c(0L, 1L, 1L, 1L,
0L, 0L), Pclass = c(3L, 1L, 3L, 1L, 3L, 3L), Sex = structure(c(2L,
1L, 1L, 1L, 2L, 2L), .Label = c("female", "male"), class = "factor"),
Age = c(22L, 38L, 26L, 35L, 35L, NA), SibSp = c(1L, 1L, 0L,
1L, 0L, 0L), Parch = c(0L, 0L, 0L, 0L, 0L, 0L), Fare = c(7.25,
71.2833, 7.925, 53.1, 8.05, 8.4583), Embarked = structure(c(3L,
1L, 3L, 3L, 3L, 2L), .Label = c("C", "Q", "S"), class = "factor"),
Age.NA = c(0, 0, 0, 0, 0, 1)), .Names = c("PassengerId",
"Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare",
"Embarked", "Age.NA"), row.names = c("1", "2", "3", "4", "5",
"6"), class = "data.frame")
我有以下数据:
PassengerId Survived Pclass Sex Age SibSp Parch Fare Embarked
1 1 0 3 male 22 1 0 7.2500 S
2 2 1 1 female 38 1 0 71.2833 C
3 3 1 3 female 26 0 0 7.9250 S
4 4 1 1 female 35 1 0 53.1000 S
5 5 0 3 male 35 0 0 8.0500 S
6 6 0 3 male NA 0 0 8.4583 Q
现在,当我使用 dummy
或 dummy.data.frame
时,我可以像这样成功地将因子(此处为 Sex
和 Embarked
)转换为虚拟变量:
PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch Fare Embarked EmbarkedC EmbarkedQ EmbarkedS
1 1 0 3 0 1 22 1 0 7.2500 0 0 0 1
2 2 1 1 1 0 38 1 0 71.2833 0 1 0 0
3 3 1 3 1 0 26 0 0 7.9250 0 0 0 1
4 4 1 1 1 0 35 1 0 53.1000 0 0 0 1
5 5 0 3 0 1 35 0 0 8.0500 0 0 0 1
6 6 0 3 0 1 NA 0 0 8.4583 0 0 1 0
现在,如果我如何将它应用到 Age
列,它会创建 100 多个虚拟对象,一个用于每个唯一的年龄条目,一个用于 NA
。我希望输出像
Age Age.NA
22 0
38 0
......
35 0
0 1
它会自动将缺失值视为不同的条目,并在因子的情况下为其创建一个变量,但我想在数字变量的情况下实现相同的目的,而不妨碍列中已有的值。请帮忙。
使用 ifelse()
语句检查 NA
:
Age.NA <- ifelse(is.na(Age), 1, 0)
您可以只使用:
df$Age.NA <- ifelse(is.na(df$Age), 1, 0)
然后:
library(dummies)
dummy.data.frame(df)
输出:
PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch Fare EmbarkedC EmbarkedQ EmbarkedS Age.NA
1 1 0 3 0 1 22 1 0 7.2500 0 0 1 0
2 2 1 1 1 0 38 1 0 71.2833 1 0 0 0
3 3 1 3 1 0 26 0 0 7.9250 0 0 1 0
4 4 1 1 1 0 35 1 0 53.1000 0 0 1 0
5 5 0 3 0 1 35 0 0 8.0500 0 0 1 0
6 6 0 3 0 1 NA 0 0 8.4583 0 1 0 1
数据:
df <- structure(list(PassengerId = 1:6, Survived = c(0L, 1L, 1L, 1L,
0L, 0L), Pclass = c(3L, 1L, 3L, 1L, 3L, 3L), Sex = structure(c(2L,
1L, 1L, 1L, 2L, 2L), .Label = c("female", "male"), class = "factor"),
Age = c(22L, 38L, 26L, 35L, 35L, NA), SibSp = c(1L, 1L, 0L,
1L, 0L, 0L), Parch = c(0L, 0L, 0L, 0L, 0L, 0L), Fare = c(7.25,
71.2833, 7.925, 53.1, 8.05, 8.4583), Embarked = structure(c(3L,
1L, 3L, 3L, 3L, 2L), .Label = c("C", "Q", "S"), class = "factor"),
Age.NA = c(0, 0, 0, 0, 0, 1)), .Names = c("PassengerId",
"Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare",
"Embarked", "Age.NA"), row.names = c("1", "2", "3", "4", "5",
"6"), class = "data.frame")