限制从 int 到 factor 的转换级别
Limit transformation levels from int to factor
我有一个包含 int 值的 int 列。我想将其转换为具有预定义数量的桶/级别/子范围的因子。
这是一个例子:
dat1 <- fread('https://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data',stringsAsFactors=T)
dat1 <- data.frame(lapply(dat1, as.factor))
> str (dat1)
'data.frame': 306 obs. of 4 variables:
$ V1: Factor w/ 49 levels "30","31","33",..: 1 1 1 2 2 3 3 4 4 4 ...
$ V2: Factor w/ 12 levels "58","59","60",..: 7 5 8 2 8 1 3 2 9 1 ...
$ V3: Factor w/ 31 levels "0","1","2","3",..: 2 4 1 3 5 11 1 1 10 28 ...
$ V4: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 2 2 1 ...
我想划分源 dat1$V3
让我们说到范围(每个都是级别)。每个源值都属于这些类别之一。
使用 -
dat1$V3_cut <- cut(as.numeric(dat1$V3), 5)
输出
V1 V2 V3 V4 V3_cut
1 30 64 1 1 (0.97,7]
2 30 62 3 1 (0.97,7]
3 30 65 0 1 (0.97,7]
4 31 59 2 1 (0.97,7]
5 31 65 4 1 (0.97,7]
6 33 58 10 1 (7,13]
或
dat1$V3_cut <- cut(as.numeric(dat1$V3), c(0,3,5,11))
输出
V1 V2 V3 V4 V3_cut
1 30 64 1 1 (0,3]
2 30 62 3 1 (3,5]
3 30 65 0 1 (0,3]
4 31 59 2 1 (0,3]
5 31 65 4 1 (3,5]
6 33 58 10 1 (5,11]
您可以指定 cut
的数量或提供包含 class 边界的列表。默认情况下 include.lowest
是 FALSE
,由边界的 (]
表示
编辑
谢谢@Rui -
dat1$V3_cut <- cut(as.numeric(dat1$V3), c(0,3,5,11), labels=1:3)
我有一个包含 int 值的 int 列。我想将其转换为具有预定义数量的桶/级别/子范围的因子。 这是一个例子:
dat1 <- fread('https://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data',stringsAsFactors=T)
dat1 <- data.frame(lapply(dat1, as.factor))
> str (dat1)
'data.frame': 306 obs. of 4 variables:
$ V1: Factor w/ 49 levels "30","31","33",..: 1 1 1 2 2 3 3 4 4 4 ...
$ V2: Factor w/ 12 levels "58","59","60",..: 7 5 8 2 8 1 3 2 9 1 ...
$ V3: Factor w/ 31 levels "0","1","2","3",..: 2 4 1 3 5 11 1 1 10 28 ...
$ V4: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 2 2 1 ...
我想划分源 dat1$V3
让我们说到范围(每个都是级别)。每个源值都属于这些类别之一。
使用 -
dat1$V3_cut <- cut(as.numeric(dat1$V3), 5)
输出
V1 V2 V3 V4 V3_cut
1 30 64 1 1 (0.97,7]
2 30 62 3 1 (0.97,7]
3 30 65 0 1 (0.97,7]
4 31 59 2 1 (0.97,7]
5 31 65 4 1 (0.97,7]
6 33 58 10 1 (7,13]
或
dat1$V3_cut <- cut(as.numeric(dat1$V3), c(0,3,5,11))
输出
V1 V2 V3 V4 V3_cut
1 30 64 1 1 (0,3]
2 30 62 3 1 (3,5]
3 30 65 0 1 (0,3]
4 31 59 2 1 (0,3]
5 31 65 4 1 (3,5]
6 33 58 10 1 (5,11]
您可以指定 cut
的数量或提供包含 class 边界的列表。默认情况下 include.lowest
是 FALSE
,由边界的 (]
表示
编辑 谢谢@Rui -
dat1$V3_cut <- cut(as.numeric(dat1$V3), c(0,3,5,11), labels=1:3)