R：计算行总和（MERSQI 分数），调整为缺失值/不适用类别

Question

我想计算行的总和，包括对缺失数据的调整。

行总和是真实的“MERSQI”分数（对研究质量进行评分，每行 1 个研究）。每个列都是一个关于质量的问题，可以达到特定的最高分。然而，在某些情况下，问题不适用于某些导致“缺失值”的研究。行总和应调整为标准分母 18 作为最大 score/row 总和，即：（最大可实现点数 = 适用的最大可实现点数总和 questions/cols）

MERSQI 总分 = 行总和/最大可达到的分数 * 18

例如：

questions <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) #number of question or col number
max_quest <- c(3, 1.5, 1.5, 3, 1, 1, 1, 1, 3) #maximum of every single question
study1 <- c(1.5, 0.5, 1.5, 3, 0, 0, 0, 1, 3) #points for every single questions for study1
study2 <- c(1, 0.5, 0.5, 3, NA, NA, NA, 1, 1, 3) # for study2
study3 <- c(2, 1.5, NA, 3, NA, 1, NA, 1, 1, 3) #for study3
df <- rbind (questions, max_quest, study1, study2, study3)

对于 study1，我们会有一个行总和和结果分数 10.5，因为没有缺失值。对于 study2，我们的行总和为 10。我们有三个 NA，study2 的最大可达到分数为 15（=18 个最大分数 - NA 问题的 3*1 分），调整后的 MERSQI 分数为 12.85（=10 *18/ 15). 对于研究 3：行总和= 12.5，最大可达到的分数=15.5 (=18 -(1.5+1+1))，调整后的 MERSQI 分数= 15.53

您知道如何通过调整缺失值来计算行总和吗？也许遍历每一行，使用 forloop 和 ifwith is.na?

谢谢！

PS：Link / 对 MERSQI 分数的解释：https://www.aliem.com/article-review-how-do-you-assess/ and https://pubmed.ncbi.nlm.nih.gov/26107881/

Answer 1

向量的长度存在问题。我编辑了数据集，使它们的长度均为 9，但这应该有效：

apply(mat[, 3:5],
      2,
      FUN = function (x) {
        tot = sum(x, na.rm = TRUE)
        nas = which(is.na(x))
        total_max = sum(max_quest)
        if (!length(nas)) 
          return(tot)
        else
          return(tot * total_max / (total_max - sum(max_quest[nas])))
      })

数据:

questions <- c(1, 2, 3, 4, 5, 6, 7, 8, 9) #number of question or col number
max_quest <- c(3, 1.5, 1.5, 3, 1, 1, 1, 1, 3) #maximum of every single question
study1 <- c(1.5, 0.5, 1.5, 3, 0, 0, 0, 1, 3) #points for every single questions for study1
study2 <- c(1, 0.5, 0.5, 3, NA, NA, NA, 1, 1) # for study2
study3 <- c(2, 1.5, NA, 3, NA, 1, NA, 1, 1) #for study3

## rename mat because cbind(...) of vectors returns matrix.
mat <- cbind (questions, max_quest, study1, study2, study3)

Answer 2

对于每个 study 列计算它的 sum 乘以 max_quest 的总和并除以 max_quest - NA 值。

library(dplyr)

val <- sum(df$max_quest)

df %>%
  summarise(across(starts_with('study'), 
            ~sum(., na.rm = TRUE)* val/(val - sum(max_quest[is.na(.)]))))

数据

由于长度不兼容，共享的数据不完整。如果这些值是按列方式而不是按行方式，这也是有意义的。

questions <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) 
max_quest <- c(3, 1.5, 1.5, 3, 1, 1, 1, 1, 3, 3)
study1 <- c(1.5, 0.5, 1.5, 3, 0, 0, 0, 1, 3, 0) 
study2 <- c(1, 0.5, 0.5, 3, NA, NA, NA, 1, 1, 3)
study3 <- c(2, 1.5, NA, 3, NA, 1, NA, 1, 1, 3)
df <- data.frame(questions, max_quest, study1, study2, study3)

Answer 3

这可以通过向量化来完成。

首先应用行总和并找到 NA 的数量：

row_sums <- apply(df, 1, function(x) sum(x, na.rm=T))

row_NAs <- apply(df,1, function(x) sum(is.na(x)))

然后提取研究和最高分：

studies <- row_sums[3:length(row_sums)]

max <- row_sums[2]

根据 NA，根据调整后的最大值计算 MERSQI：

adjusted_max <- rep(max, length(studies)) - row_NAs[3:length(row_NAs)]

MERSQI <- studies * max / adjusted_max

R：计算行总和（MERSQI 分数），调整为缺失值/不适用类别

R: Calculate row sum (MERSQI score), adjusted to missing values / not applicable categories

r

missing-data

dplyr

data-cleaning