计算 R 中每列的 NA 总数

Count Total Number of NAs per Column in R

我目前正在尝试计算在我的数据集的每个列中找到的 NA 的数量。

我是运行以下代码:

  function(x, df1, df2, ncp, log = FALSE)

apply(Total_HousingData, 2, function(x) {sum(is.na(x))})

这是我的输出:

        Id    MSSubClass      MSZoning   LotFrontage       LotArea        Street 
            0             0             0             0             0             0 
        Alley      LotShape   LandContour     Utilities     LotConfig     LandSlope 
            0             0             0             0             0             0 
 Neighborhood    Condition1    Condition2      BldgType    HouseStyle   OverallQual 
            0             0             0             0             0             0 
  OverallCond     YearBuilt  YearRemodAdd     RoofStyle      RoofMatl   Exterior1st 
            0             0             0             0             0             0 
  Exterior2nd    MasVnrType    MasVnrArea     ExterQual     ExterCond    Foundation 
            0             0             0             0             0             0 
     BsmtQual      BsmtCond  BsmtExposure  BsmtFinType1    BsmtFinSF1  BsmtFinType2 
            0             0             0             0             1             0 
   BsmtFinSF2     BsmtUnfSF   TotalBsmtSF       Heating     HeatingQC    CentralAir 
            1             1             1             0             0             0 
   Electrical      1stFlrSF      2ndFlrSF  LowQualFinSF     GrLivArea  BsmtFullBath 
            0             0             0             0             0             2 
 BsmtHalfBath      FullBath      HalfBath  BedroomAbvGr  KitchenAbvGr   KitchenQual 
            2             0             0             0             0             0 
 TotRmsAbvGrd    Functional    Fireplaces   FireplaceQu    GarageType   GarageYrBlt 
            0             0             0             0             0             0 
 GarageFinish    GarageCars    GarageArea    GarageQual    GarageCond    PavedDrive 
            0             1             1             0             0             0 
   WoodDeckSF   OpenPorchSF EnclosedPorch     3SsnPorch   ScreenPorch      PoolArea 
            0             0             0             0             0             0 
       PoolQC         Fence   MiscFeature       MiscVal        MoSold        YrSold 
            0             0             0             0             0             0 
     SaleType SaleCondition     SalePrice 
            0             0          1459

出于某种原因,所有 NA 计数都计入 SalePrice 变量。当我查看其他变量时,有很多 NA。我尝试分解适当的变量,但这仍然没有解决问题。

例如“Alley”应该读作 1,但它的 NA 没有被拾取。

这是代码示例:

 Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities
  <dbl>      <dbl> <chr>    <chr>         <dbl> <chr>  <chr> <chr>    <chr>       <chr>    
1     1         60 RL       65             8450 Pave   NA    Reg      Lvl         AllPub   
2     2         20 RL       80             9600 Pave   NA    Reg      Lvl         AllPub   
3     3         60 RL       68            11250 Pave   NA    IR1      Lvl         AllPub   
4     4         70 RL       60             9550 Pave   NA    IR1      Lvl         AllPub   
5     5         60 RL       84            14260 Pave   NA    IR1      Lvl         AllPub   
6     6         50 RL       85            14115 Pave   NA    IR1      Lvl         AllPub   

尝试使用 sapply,这是我使用的单行代码,df 作为您的数据框。

sapply(df, function(x) sum(is.na(x)))

colSums() 的另一个解决方案。 is.na(df) 给你一个数据框和它的所有列 对于每个单元格 NA,逻辑是 TRUEcolSums()总结 TRUE 个值。

Total_HousingData <- data.frame(A = c(1, 2, NA, NA, NA), B = c(1, NA, 3, 4, 5), C = c(NA, 2, 3, NA, 5))

colSums(is.na(Total_HousingData))
#> A B C 
#> 3 1 2

reprex package (v1.0.0)

于 2021-02-20 创建