转换为数字类型后值发生变化
Values changed after transforming into numeric type
原始数据
chr pos ref alt region gene co1l col2 col3 col4 col5 col6
chr11 10000117 G A intronic SBF2 0.28 0.2813 . 0.008683 0.0157 2.091
chr11 100001537 T C intronic CNTN5 . . . . -0.1877 1.202
chr11 100002012 A G intronic CNTN5 1.0 0.7227 . 0.764062 -0.4256 1.584
chr11 10000210 G C intronic SBF2 0.28 0.2813 . 0.222606 -0.3470 0.179
当我尝试时
filter <- filter(data,data$col1>=0.5 | t$col1 == ".")
弹出警告消息
In Ops.factor(t$col1, 0.5) : ‘>=’ not meaningful for factors
然后我尝试将多列转换为数字
cols.num <- c("col1","col2","col3","col4")
data[cols.num] <- sapply(data[cols.num],as.numeric)
很糟糕,所有值都更改如下
chr11 10000117 G A intronic SBF2 2 2 1 2 0.0157 2.091
chr11 100001537 T C intronic CNTN5 1 1 1 1 -0.1877 1.202
chr11 100002012 A G intronic CNTN5 3 3 1 4 -0.4256 1.584
chr11 10000210 G C intronic SBF2 2 2 1 3 -0.3470 0.179
我不知道为什么值都变了,谁能帮忙解决问题?
谢谢!
首先将值从 "."
转换为 NA
df[cols.num][df[cols.num] == "."] <- NA
然后将因子列更改为字符,然后更改为数字
df[cols.num] <- lapply(df[cols.num], function(x) as.numeric(as.character(x)))
df
#chr pos ref alt region gene col1 col2 col3 col4 col5 col6
#1 chr11 10000117 G A intronic SBF2 0.28 0.2813 NA 0.008683 0.0157 2.091
#2 chr11 100001537 T C intronic CNTN5 NA NA NA NA -0.1877 1.202
#3 chr11 100002012 A G intronic CNTN5 1.00 0.7227 NA 0.764062 -0.4256 1.584
#4 chr11 10000210 G C intronic SBF2 0.28 0.2813 NA 0.222606 -0.3470 0.179
str(df[cols.num])
#'data.frame': 4 obs. of 4 variables:
# $ col1: num 0.28 NA 1 0.28
# $ col2: num 0.281 NA 0.723 0.281
# $ col3: num NA NA NA NA
# $ col4: num 0.00868 NA 0.76406 0.22261
现在您可以对数据应用任何您喜欢的转换。
df[df$col1 > 0.5 & !is.na(df$col1), ]
# chr pos ref alt region gene col1 col2 col3 col4 col5 col6
#3 chr11 100002012 A G intronic CNTN5 1 0.7227 NA 0.764062 -0.4256 1.584
数据
df <- structure(list(chr = structure(c(1L, 1L, 1L, 1L), .Label = "chr11",
class = "factor"), pos = c(10000117L, 100001537L, 100002012L, 10000210L),
ref = structure(c(2L, 3L, 1L, 2L), .Label = c("A", "G", "T"), class = "factor"),
alt = structure(c(1L, 2L, 3L, 2L), .Label = c("A", "C", "G"
), class = "factor"), region = structure(c(1L, 1L, 1L, 1L
), .Label = "intronic", class = "factor"), gene = structure(c(2L,
1L, 1L, 2L), .Label = c("CNTN5", "SBF2"), class = "factor"),
col1 = structure(c(2L, 1L, 3L, 2L), .Label = c(".", "0.28",
"1.0"), class = "factor"), col2 = structure(c(2L, 1L, 3L,
2L), .Label = c(".", "0.2813", "0.7227"), class = "factor"),
col3 = structure(c(1L, 1L, 1L, 1L), .Label = ".", class = "factor"),
col4 = structure(c(2L, 1L, 4L, 3L), .Label = c(".", "0.008683",
"0.222606", "0.764062"), class = "factor"), col5 = c(0.0157,
-0.1877, -0.4256, -0.347), col6 = c(2.091, 1.202, 1.584,
0.179)), class = "data.frame", row.names = c(NA, -4L))
原始数据
chr pos ref alt region gene co1l col2 col3 col4 col5 col6
chr11 10000117 G A intronic SBF2 0.28 0.2813 . 0.008683 0.0157 2.091
chr11 100001537 T C intronic CNTN5 . . . . -0.1877 1.202
chr11 100002012 A G intronic CNTN5 1.0 0.7227 . 0.764062 -0.4256 1.584
chr11 10000210 G C intronic SBF2 0.28 0.2813 . 0.222606 -0.3470 0.179
当我尝试时
filter <- filter(data,data$col1>=0.5 | t$col1 == ".")
弹出警告消息
In Ops.factor(t$col1, 0.5) : ‘>=’ not meaningful for factors
然后我尝试将多列转换为数字
cols.num <- c("col1","col2","col3","col4")
data[cols.num] <- sapply(data[cols.num],as.numeric)
很糟糕,所有值都更改如下
chr11 10000117 G A intronic SBF2 2 2 1 2 0.0157 2.091
chr11 100001537 T C intronic CNTN5 1 1 1 1 -0.1877 1.202
chr11 100002012 A G intronic CNTN5 3 3 1 4 -0.4256 1.584
chr11 10000210 G C intronic SBF2 2 2 1 3 -0.3470 0.179
我不知道为什么值都变了,谁能帮忙解决问题?
谢谢!
首先将值从 "."
转换为 NA
df[cols.num][df[cols.num] == "."] <- NA
然后将因子列更改为字符,然后更改为数字
df[cols.num] <- lapply(df[cols.num], function(x) as.numeric(as.character(x)))
df
#chr pos ref alt region gene col1 col2 col3 col4 col5 col6
#1 chr11 10000117 G A intronic SBF2 0.28 0.2813 NA 0.008683 0.0157 2.091
#2 chr11 100001537 T C intronic CNTN5 NA NA NA NA -0.1877 1.202
#3 chr11 100002012 A G intronic CNTN5 1.00 0.7227 NA 0.764062 -0.4256 1.584
#4 chr11 10000210 G C intronic SBF2 0.28 0.2813 NA 0.222606 -0.3470 0.179
str(df[cols.num])
#'data.frame': 4 obs. of 4 variables:
# $ col1: num 0.28 NA 1 0.28
# $ col2: num 0.281 NA 0.723 0.281
# $ col3: num NA NA NA NA
# $ col4: num 0.00868 NA 0.76406 0.22261
现在您可以对数据应用任何您喜欢的转换。
df[df$col1 > 0.5 & !is.na(df$col1), ]
# chr pos ref alt region gene col1 col2 col3 col4 col5 col6
#3 chr11 100002012 A G intronic CNTN5 1 0.7227 NA 0.764062 -0.4256 1.584
数据
df <- structure(list(chr = structure(c(1L, 1L, 1L, 1L), .Label = "chr11",
class = "factor"), pos = c(10000117L, 100001537L, 100002012L, 10000210L),
ref = structure(c(2L, 3L, 1L, 2L), .Label = c("A", "G", "T"), class = "factor"),
alt = structure(c(1L, 2L, 3L, 2L), .Label = c("A", "C", "G"
), class = "factor"), region = structure(c(1L, 1L, 1L, 1L
), .Label = "intronic", class = "factor"), gene = structure(c(2L,
1L, 1L, 2L), .Label = c("CNTN5", "SBF2"), class = "factor"),
col1 = structure(c(2L, 1L, 3L, 2L), .Label = c(".", "0.28",
"1.0"), class = "factor"), col2 = structure(c(2L, 1L, 3L,
2L), .Label = c(".", "0.2813", "0.7227"), class = "factor"),
col3 = structure(c(1L, 1L, 1L, 1L), .Label = ".", class = "factor"),
col4 = structure(c(2L, 1L, 4L, 3L), .Label = c(".", "0.008683",
"0.222606", "0.764062"), class = "factor"), col5 = c(0.0157,
-0.1877, -0.4256, -0.347), col6 = c(2.091, 1.202, 1.584,
0.179)), class = "data.frame", row.names = c(NA, -4L))