这个 data.table R 代码有 Neat/Simplest 方法吗?

Is There A Neat/Simplest Way To This data.table R Code?

OECD 数据中的 STRATUM 太长了,为了简单起见,我使用了这个名称,并希望将其简化为更短和更精确的命名,如下面的代码所示。

pisaMas[,`:=`
             (SchoolType = c(ifelse(STRATUM == "National Secondary School", "Public", 
                                    ifelse(STRATUM == "Religious School", "Religious", 
                                           ifelse(STRATUM == "MOE Technical School", "Technical",0)))))]
pisaMas[,table(SchoolType)]

我想知道是否有解决这个问题的简单方法,使用 data.table 包。

这是我经过一番思考得出的。

#' First I create a function (rname.SchType) that have oldname and newname using else if:

rname.SchType <- function(x){
  if (is.na(x)) NA
  else if (x == "MYS - stratum 01: MOE National Secondary School\Other States")"Public"
  else if(x == "MYS - stratum 02: MOE Religious School\Other States")"Religious" 
  else if(x == "MYS - stratum 03: MOE Technical School\Other States")"Technical"
  else if(x == "MYS - stratum 04: MOE Fully Residential School")"SBP"
  else if(x == "MYS - stratum 05: non-MOE MARA Junior Science College\Other States")"MARA"
  else if(x == "MYS - stratum 06: non-MOE Other Schools\Other States")"Private"
  else if(x == "MYS - stratum 07: Perlis non-“MOE Fully Residential Schools”")"Perlis Fully Residential"
  else if(x == "MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools”")"Putrajaya Fully Residential"
  else if(x == "MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools”")"Labuan Fully Residential"
}

通过使用我刚刚创建的函数,我通过在 data.table 中应用基础 R(sapply),仅用一行代码将其通过 data.table,从而避免了代码混乱-ness 并且看起来更简单:

pisaMalaysia[,`:=`(jenisSekolah = sapply(STRATUM,rname.SchType))]

data.table 的当前开发版本具有针对这种情况的新函数 fcase(仿照 SQL CASE WHEN):

pisaMas[ , SchoolType := fcase(
  STRATUM == "National Secondary School", "Public", 
  STRATUM == "Religious School", "Religious", 
  STRATUM == "MOE Technical School", "Technical",
  default = ''
)]
pisaMas[ , table(SchoolType)]

要获取开发版,请尝试

install.packages(
  'data.table', type = 'source',repos = 'http://Rdatatable.github.io/data.table'
)

如果简单安装不起作用,您可以查看安装 wiki 了解更多详细信息:

https://github.com/Rdatatable/data.table/wiki/Installation

您也可以通过查找 table 来解决此问题,详情请参阅此问答:

我想我终于找到了上面问题的答案!这个答案克服了@Roland提到的 'not vectorized' 问题,谢谢先生!在我看来,它要快得多,尽管我花了几周的时间才理解这个概念并在网上找到正确的问题!

首先,我创建了一个新的 data.table,它包含 2 列,一列是原始名称,第二列是学校所需的名称。

lookUpStratum <- data.table(STRATUM=c("MYS - stratum 01: MOE National Secondary School\Other States",
                                      "MYS - stratum 02: MOE Religious School\Other States",
                                      "MYS - stratum 03: MOE Technical School\Other States",
                                      "MYS - stratum 04: MOE Fully Residential School",
                                      "MYS - stratum 05: non-MOE MARA Junior Science College\Other States",
                                      "MYS - stratum 06: non-MOE Other Schools\Other States",
                                      "MYS - stratum 07: Perlis non-“MOE Fully Residential Schools”",
                                      "MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools”",
                                      "MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools”"),
                            SCH.TYPE=c("Public",
                                       "Religious",
                                       "Technical",
                                       "SBP",
                                       "MARA",
                                       "Private",
                                       "Perlis Fully Residential",
                                       "Putrajaya Fully Residential",
                                       "Labuan Fully Residential"))

答案在于 setDT(通过引用将列表和 data.frames 强制转换为 data.table)。

使用我阅读的这行代码,它看起来有点长但它解决了我的问题!老实说,在我理解下面最短的代码之前,我首先理解了这一点。

setDT(pisaMalaysia)[,SCH.TYPE := lookUpStratum$SCH.TYPE[match(pisaMalaysia$STRATUM,lookUpStratum$STRATUM)]]

几分钟后,我终于设法理解这段代码 并生成了这段代码:

setDT(pisaMalaysia)[lookUpStratum,SCH.TYPE1 := i.SCH.TYPE, on = c(STRATUM = "STRATUM")]

我从同一个 post .

那里得到了这些答案

检查一切是否相同:

table(pisaMalaysia$SCH.TYPE)
table(pisaMalaysia$SCH.TYPE1)
#' original data
pisaMalaysia[,table(STRATUM)]

结果:

> table(pisaMalaysia$SCH.TYPE)
   Labuan Fully Residential                        MARA    Perlis Fully Residential 
                         54                         122                          78 
                    Private                      Public Putrajaya Fully Residential 
                        385                        4929                          78 
                  Religious                         SBP                   Technical 
                        273                        2661                         281 

> table(pisaMalaysia$SCH.TYPE1)
   Labuan Fully Residential                        MARA    Perlis Fully Residential 
                         54                         122                          78 
                    Private                      Public Putrajaya Fully Residential 
                        385                        4929                          78 
                  Religious                         SBP                   Technical 
                        273                        2661                         281 

> pisaMalaysia[,table(STRATUM)]
STRATUM
                      MYS - stratum 01: MOE National Secondary School\Other States 
                                                                               4929 
                               MYS - stratum 02: MOE Religious School\Other States 
                                                                                273 
                               MYS - stratum 03: MOE Technical School\Other States 
                                                                                281 
                                     MYS - stratum 04: MOE Fully Residential School 
                                                                               2661 
                MYS - stratum 05: non-MOE MARA Junior Science College\Other States 
                                                                                122 
                              MYS - stratum 06: non-MOE Other Schools\Other States 
                                                                                385 
                       MYS - stratum 07: Perlis non-“MOE Fully Residential Schools” 
                                                                                 78 
MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools” 
                                                                                 78 
   MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools” 
                                                                                 54 

谢谢!希望这对其他人也有帮助。