R 和数据选择
R and Data Selection
我有一个数据table dt,如下所示:
structure(list(IM = c(0.830088495575221, 0.681436210847976, 0.498810939357907,
0.47265400115141, 0.527908540685945, 0.580763582966226, 0.408069043807859,
0.467368671545006, 0.44662887412295, 0.0331974034502217, 0.0368210899219588,
0.0333698233772947, 0.0294312465832275, 0.578743426515361, 0.566950053134963,
0.808756701221038, 0.585507838980771, 0.61507839619537, 0.586388329979879,
0.794196637085474), CM = c(0.876991150442478, 0.996180290297937,
0.651605231866825, 0.824409902130109, 0.94418291862811, 0.961820851688693,
0.943861532396347, 1.10137922144883, 1.1524325077831, 0.128868067469359,
0.155932251596297, 0.159414951213752, 0.196968075413411, 1.19678937171326,
0.901168969181722, 3.42528220866977, 2.4377239516641, 2.0040870054458,
1.86099597585513, 1.51928615911568), RM = c(0.601769911504425,
0.495034377387319, 0.405469678953627, 0.368451352907311, 0.361802286482851,
0.320851688693098, 0.791548118347242, 0.816050925099649, 0.786622368849031,
0.545805622636092, 0.594370732740163, 0.594771872860171, 0.536043514857356,
0.617215610296153, 0.619287991498406, 0.602602774009141, 0.634069706132375,
0.596543561108693, 0.582203219315895, 0.695985131558462)), .Names = c("IM", "CM", "RM"), class = c("data.table", "data.frame"), row.names
= c(NA,
-20L), .internal.selfref = <pointer: 0x00000000003f0788>)
我写了一个函数如下:
DSanity.markWinsorize <- function(dt, colnames)
{
PERnames <- unlist(lapply(colnames, function(x) paste0("PER",x)));
print(dt[,colnames])
if(length(colnames)>1)
{dt[,PERnames] <- sapply(dt[,colnames], Num.calPtile);}
else
{dt[,PERnames] <- Num.calPtile(dt[,colnames]);}
return(dt)
}
## Calculate Percentile score of a data vector
Num.calPtile <- function(x)
{
return((ecdf(x))(x))
}
此函数的工作是创建新列,计算提供给函数 markWinsorize 的列的每个数据点的百分位数。
我在这里尝试 运行 函数 markWinsorize:
colnames <- c('CM','AM','BM')
DSanity.markWinsorize(dt,colnames)
我收到以下错误:
> sdc1 <- DSanity.markWinsorize(sdc,colnames)
[1] "CM" "AM" "BM"
Show Traceback
Re运行 与 Debug
Error in approxfun(vals, cumsum(tabulate(match(x, vals)))/n, method = "constant", :
zero non-NA points In addition: Warning message:
In xy.coords(x, y) : NAs introduced by coercion
如果你们中的一些人能帮助我,那就太好了。谢谢。
你的方法很笨拙。我推荐一种全新的方法。
library(dplyr)
colnames <- c("CM", "AM", "BM")
dt %>%
select_(.dots = colnames) %>%
mutate_each(funs(ntile(., 100)))
我认为这可以满足您的需求(可能还添加了 %>% bind_cols(dt)
)。
我有一个数据table dt,如下所示:
structure(list(IM = c(0.830088495575221, 0.681436210847976, 0.498810939357907,
0.47265400115141, 0.527908540685945, 0.580763582966226, 0.408069043807859,
0.467368671545006, 0.44662887412295, 0.0331974034502217, 0.0368210899219588,
0.0333698233772947, 0.0294312465832275, 0.578743426515361, 0.566950053134963,
0.808756701221038, 0.585507838980771, 0.61507839619537, 0.586388329979879,
0.794196637085474), CM = c(0.876991150442478, 0.996180290297937,
0.651605231866825, 0.824409902130109, 0.94418291862811, 0.961820851688693,
0.943861532396347, 1.10137922144883, 1.1524325077831, 0.128868067469359,
0.155932251596297, 0.159414951213752, 0.196968075413411, 1.19678937171326,
0.901168969181722, 3.42528220866977, 2.4377239516641, 2.0040870054458,
1.86099597585513, 1.51928615911568), RM = c(0.601769911504425,
0.495034377387319, 0.405469678953627, 0.368451352907311, 0.361802286482851,
0.320851688693098, 0.791548118347242, 0.816050925099649, 0.786622368849031,
0.545805622636092, 0.594370732740163, 0.594771872860171, 0.536043514857356,
0.617215610296153, 0.619287991498406, 0.602602774009141, 0.634069706132375,
0.596543561108693, 0.582203219315895, 0.695985131558462)), .Names = c("IM", "CM", "RM"), class = c("data.table", "data.frame"), row.names
= c(NA,
-20L), .internal.selfref = <pointer: 0x00000000003f0788>)
我写了一个函数如下:
DSanity.markWinsorize <- function(dt, colnames)
{
PERnames <- unlist(lapply(colnames, function(x) paste0("PER",x)));
print(dt[,colnames])
if(length(colnames)>1)
{dt[,PERnames] <- sapply(dt[,colnames], Num.calPtile);}
else
{dt[,PERnames] <- Num.calPtile(dt[,colnames]);}
return(dt)
}
## Calculate Percentile score of a data vector
Num.calPtile <- function(x)
{
return((ecdf(x))(x))
}
此函数的工作是创建新列,计算提供给函数 markWinsorize 的列的每个数据点的百分位数。
我在这里尝试 运行 函数 markWinsorize:
colnames <- c('CM','AM','BM')
DSanity.markWinsorize(dt,colnames)
我收到以下错误:
> sdc1 <- DSanity.markWinsorize(sdc,colnames)
[1] "CM" "AM" "BM"
Show Traceback
Re运行 与 Debug
Error in approxfun(vals, cumsum(tabulate(match(x, vals)))/n, method = "constant", :
zero non-NA points In addition: Warning message:
In xy.coords(x, y) : NAs introduced by coercion
如果你们中的一些人能帮助我,那就太好了。谢谢。
你的方法很笨拙。我推荐一种全新的方法。
library(dplyr)
colnames <- c("CM", "AM", "BM")
dt %>%
select_(.dots = colnames) %>%
mutate_each(funs(ntile(., 100)))
我认为这可以满足您的需求(可能还添加了 %>% bind_cols(dt)
)。