将 "vector" 从长改成宽

Reshaping a "vector" from long to wide

我有一个带有 rownames 的矢量,所以它可以被认为是一个有 2 列的 "matrix"(一列用于文件名,一列用于 Topic):

> res
                   Topic
jardine-1.docx.md      1
jardine-2.docx.md      1
jardine-a1.docx.md     1
jardine-a2.docx.md     1
jardine-a3.docx.md     1
jardine-a4.docx.md     3
jardine-a5.docx.md     1
jardine-a6.docx.md     3
jardine-a7.docx.md     3
jardine-a8.docx.md     1
...

这些是很棒的主题建模 R 包的结果,恰当地称为 topicmodels

我想cast将此"vector"转换为格式,仅供演示之用。

这门课程违反了 "tidy data" 原则,其中 "each observation, or case, is in its own row"(请参阅 使用 dplyr 进行数据转换,可用 here。 ) 尽管如此,宽格式比长格式更整洁:

              Topic1       Topic2             Topic3
1  jardine-1.docx.md jk-1.docx.md jardine-a4.docx.md
2  jardine-2.docx.md jk-2.docx.md jardine-a6.docx.md
3 jardine-a1.docx.md jk-4.docx.md jardine-a7.docx.md
4 jardine-a2.docx.md jk-5.docx.md  singtel-1.docx.md
5 jardine-a3.docx.md jk-6.docx.md  singtel-2.docx.md
6 jardine-a5.docx.md         <NA>  singtel-3.docx.md
7 jardine-a8.docx.md         <NA>  singtel-4.docx.md
8       jk-3.docx.md         <NA>  singtel-5.docx.md
9       jk-7.docx.md         <NA>               <NA>

这当然可以通过多种方式完成 - 其中一种看起来像这样(警告:丑陋

# via cbind
T1=rownames(subset(res, Topic==1))
T2=rownames(subset(res, Topic==2))
T3=rownames(subset(res, Topic==3))
n=max(length(T1),length(T2),length(T3))
length(T1) <- n
length(T2) <- n
length(T3) <- n
cbind(T1,T2,T3)

我的问题:

考虑到所有代码都将包含在 R Markdown 文件中以供展示,是否还有其他更好的展示方式?

我会用 DT 包在降价中创建一个交互式 table。 Link to vignette

library(DT)

datatable(
  dataframe, class = 'cell-border stripe', extensions = c('Buttons', 'FixedColumns'), options = list(
    dom = 'Bfrtip', scrollX = TRUE, fixedColumns = TRUE,
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

探索 vignette,它有很多选项,例如:用颜色和形状格式化字段,使用户能够交互式地添加或删除列,在宽 tables 中滚动,等等。

如果您只是在寻找更简洁的代码,也许这会让您满意?

nmax <- max(table(res$Topic))
ntopics <- 3 # or ntopics <- max(res$Topic) to be more general
build_col <- function(i){rn <- rownames(subset(res,Topic==i)); rn <- c(rn,rep(NA,nmax-length(rn)))} # you may replace NA by "" here for it to look nicer
sapply(1:ntopics,build_col) %>% as.data.frame %>% setNames(paste0("Topic",1:ntopics))

#               Topic1 Topic2             Topic3
# 1  jardine-1.docx.md   <NA> jardine-a4.docx.md
# 2  jardine-2.docx.md   <NA> jardine-a6.docx.md
# 3 jardine-a1.docx.md   <NA> jardine-a7.docx.md
# 4 jardine-a2.docx.md   <NA>               <NA>
# 5 jardine-a3.docx.md   <NA>               <NA>
# 6 jardine-a5.docx.md   <NA>               <NA>
# 7 jardine-a8.docx.md   <NA>               <NA>