skimr:如何获得前 3 个和后 3 个值?
skimr: how to get the top 3 and bottom 3 values?
考虑这个简单的例子
> tibble(value = c(1,2,3,4,5,5,6,7,8,9,10,11,12)) %>%
+ skim()
Skim summary statistics
n obs: 13
n variables: 1
-- Variable type:numeric -------------------------------------------------------
variable missing complete n mean sd p0 p25 p50 p75 p100 hist
value 0 13 13 6.38 3.48 1 4 6 9 12 ▅▂▇▂▂▅▂▅
我会简单地向 skimr 输出添加两列 top
和 bottom
,显示前 3 个值和后 3 个值,用逗号分隔。
类似
top bottom
12,11,10 1,2,3
我该怎么做?
谢谢!
更新的答案:
#remove the p values and histogram for space to work with
skim_with(numeric = list(p0 = NULL, p25 = NULL, p50=NULL, p75 = NULL, p100=NULL, hist=NULL))
#6 functions, for head 1 2 and 3, and tail 3 2 and 1.
h1<-function(x){head(sort(x))[1]}
h2<-function(x){head(sort(x))[2]}
h3<-function(x){head(sort(x))[3]}
t3<-function(x){tail(sort(x),3)[1]}
t2<-function(x){tail(sort(x),2)[1]}
t1<-function(x){tail(sort(x),1)[1]}
#assign those functions to return for numeric (need to do the same for integer and others)
skim_with(numeric = list(h1=h1, h2=h2, h3=h3, t3=t3, t2=t2, t1=t1))
skim(iris$Sepal.Length)
Skim summary statistics
── Variable type:numeric ────────────────────────────────────────────────
variable missing complete n mean sd h1 h2 h3 t3 t2 t1
iris$Sepal.Length 0 150 150 5.84 0.83 4.3 4.4 4.4 7.7 7.7 7.9
好的,我成功了。
供日后参考:
get_top <- function(df) {
df %>% as_tibble() %>%
top_n(3) %>%
pull() %>%
paste(collapse = ',')
}
skim_with(numeric = list(top = get_top), append = TRUE)
给予
> tibble(value = c(1,2,3,4,5,5,6,7,8,9,10,11,12)) %>%
+ skim()
Selecting by value
Skim summary statistics
n obs: 13
n variables: 1
-- Variable type:numeric -------------------------------------------------------
variable missing complete n mean sd p0 p25 p50 p75 p100 hist top
value 0 13 13 6.38 3.48 1 4 6 9 12 ▅▂▇▂▂▅▂▅ 10,11,12
考虑这个简单的例子
> tibble(value = c(1,2,3,4,5,5,6,7,8,9,10,11,12)) %>%
+ skim()
Skim summary statistics
n obs: 13
n variables: 1
-- Variable type:numeric -------------------------------------------------------
variable missing complete n mean sd p0 p25 p50 p75 p100 hist
value 0 13 13 6.38 3.48 1 4 6 9 12 ▅▂▇▂▂▅▂▅
我会简单地向 skimr 输出添加两列 top
和 bottom
,显示前 3 个值和后 3 个值,用逗号分隔。
类似
top bottom
12,11,10 1,2,3
我该怎么做? 谢谢!
更新的答案:
#remove the p values and histogram for space to work with
skim_with(numeric = list(p0 = NULL, p25 = NULL, p50=NULL, p75 = NULL, p100=NULL, hist=NULL))
#6 functions, for head 1 2 and 3, and tail 3 2 and 1.
h1<-function(x){head(sort(x))[1]}
h2<-function(x){head(sort(x))[2]}
h3<-function(x){head(sort(x))[3]}
t3<-function(x){tail(sort(x),3)[1]}
t2<-function(x){tail(sort(x),2)[1]}
t1<-function(x){tail(sort(x),1)[1]}
#assign those functions to return for numeric (need to do the same for integer and others)
skim_with(numeric = list(h1=h1, h2=h2, h3=h3, t3=t3, t2=t2, t1=t1))
skim(iris$Sepal.Length)
Skim summary statistics ── Variable type:numeric ──────────────────────────────────────────────── variable missing complete n mean sd h1 h2 h3 t3 t2 t1 iris$Sepal.Length 0 150 150 5.84 0.83 4.3 4.4 4.4 7.7 7.7 7.9
好的,我成功了。 供日后参考:
get_top <- function(df) {
df %>% as_tibble() %>%
top_n(3) %>%
pull() %>%
paste(collapse = ',')
}
skim_with(numeric = list(top = get_top), append = TRUE)
给予
> tibble(value = c(1,2,3,4,5,5,6,7,8,9,10,11,12)) %>%
+ skim()
Selecting by value
Skim summary statistics
n obs: 13
n variables: 1
-- Variable type:numeric -------------------------------------------------------
variable missing complete n mean sd p0 p25 p50 p75 p100 hist top
value 0 13 13 6.38 3.48 1 4 6 9 12 ▅▂▇▂▂▅▂▅ 10,11,12