Python 模块中的类内相关性?
Intraclass Correlation in Python Module?
我正在寻找计算 intraclass correlation (ICC) in Python. I haven't been able to find an existing module that has this feature. Is there an alternate name, or should I do it myself? I'm aware this question was asked a year ago 由另一个用户交叉验证,但没有回复。我正在寻找比较两个评分者之间的连续分数。
您可以在 ICC or Brain_Data.icc
找到实现
ICC in R. These can be used from Python via the rpy2 包有多种实现。示例:
from rpy2.robjects import DataFrame, FloatVector, IntVector
from rpy2.robjects.packages import importr
from math import isclose
groups = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4,
4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8]
values = [1, 2, 0, 1, 1, 3, 3, 2, 3, 8, 1, 4, 6, 4, 3,
3, 6, 5, 5, 6, 7, 5, 6, 2, 8, 7, 7, 9, 9, 9, 9, 8]
r_icc = importr("ICC")
df = DataFrame({"groups": IntVector(groups),
"values": FloatVector(values)})
icc_res = r_icc.ICCbare("groups", "values", data=df)
icc_val = icc_res[0] # icc_val now holds the icc value
# check whether icc value equals reference value
print(isclose(icc_val, 0.728, abs_tol=0.001))
R package psych 具有类内相关性 (ICC) 的实现,可计算多种类型的变体,包括 ICC(1,1)、ICC(1,k)、ICC(2,1)、ICC (2,k)、ICC(3,1) 和 ICC(3,k) 以及其他指标。
This page 对不同变体进行了很好的比较,
您可以通过rpy2包使用R ICC功能。
示例:
- 首先在 R:
中安装 psych
和 lme4
install.packages("psych")
install.packages("lme4")
- 使用 rpy2:
在 Python 中计算 ICC 系数
import rpy2
from rpy2.robjects import IntVector, pandas2ri
from rpy2.robjects.packages import importr
psych = importr("psych")
values = rpy2.robjects.r.matrix(
IntVector(
[9, 2, 5, 8,
6, 1, 3, 2,
8, 4, 6, 8,
7, 1, 2, 6,
10, 5, 6, 9,
6, 2, 4, 7]),
ncol=4, byrow=True
)
icc = psych.ICC(values)
# Convert to Pandas DataFrame
icc_df = pandas2ri.rpy2py(icc[0])
结果:
type ICC F df1 df2 p lower bound upper bound
Single_raters_absolute ICC1 0.165783 1.794916 5.0 18.0 0.164720 -0.132910 0.722589
Single_random_raters ICC2 0.289790 11.026650 5.0 15.0 0.000135 0.018791 0.761107
Single_fixed_raters ICC3 0.714829 11.026650 5.0 15.0 0.000135 0.342447 0.945855
Average_raters_absolute ICC1k 0.442871 1.794916 5.0 18.0 0.164720 -0.884193 0.912427
Average_random_raters ICC2k 0.620080 11.026650 5.0 15.0 0.000135 0.071153 0.927240
Average_fixed_raters ICC3k 0.909311 11.026650 5.0 15.0 0.000135 0.675657 0.985891
pengouin 库以 6 种不同的方式计算 ICC,以及相关的置信度和 p 值。
您可以使用 pip install pingouin
或 conda install -c conda-forge pingouin
安装它
import pingouin as pg
data = pg.read_dataset('icc')
icc = pg.intraclass_corr(data=data, targets='Wine', raters='Judge',
ratings='Scores')
data.head()
| | Wine | Judge | Scores |
|---:|-------:|:--------|---------:|
| 0 | 1 | A | 1 |
| 1 | 2 | A | 1 |
| 2 | 3 | A | 3 |
| 3 | 4 | A | 6 |
| 4 | 5 | A | 6 |
| 5 | 6 | A | 7 |
| 6 | 7 | A | 8 |
| 7 | 8 | A | 9 |
| 8 | 1 | B | 2 |
| 9 | 2 | B | 3 |
icc
| | Type | Description | ICC | F | df1 | df2 | pval | CI95% |
|---:|:-------|:------------------------|------:|-------:|------:|------:|------------:|:-------------|
| 0 | ICC1 | Single raters absolute | 0.773 | 11.199 | 5 | 12 | 0.000346492 | [0.39, 0.96] |
| 1 | ICC2 | Single random raters | 0.783 | 27.966 | 5 | 10 | 1.42573e-05 | [0.25, 0.96] |
| 2 | ICC3 | Single fixed raters | 0.9 | 27.966 | 5 | 10 | 1.42573e-05 | [0.65, 0.98] |
| 3 | ICC1k | Average raters absolute | 0.911 | 11.199 | 5 | 12 | 0.000346492 | [0.65, 0.99] |
| 4 | ICC2k | Average random raters | 0.915 | 27.966 | 5 | 10 | 1.42573e-05 | [0.5, 0.99] |
| 5 | ICC3k | Average fixed raters | 0.964 | 27.966 | 5 | 10 | 1.42573e-05 | [0.85, 0.99] |
基于Brain_Data,我修改了代码以计算相关系数ICC(2,1)、ICC(2,k)、ICC(3,1)或ICC(3,k) ) 作为 table Y 的数据输入(行中的主题和列中的重复测量)。
import os
import numpy as np
from numpy import ones, kron, mean, eye, hstack, dot, tile
from numpy.linalg import pinv
def icc(Y, icc_type='ICC(2,1)'):
''' Calculate intraclass correlation coefficient
ICC Formulas are based on:
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in
assessing rater reliability. Psychological bulletin, 86(2), 420.
icc1: x_ij = mu + beta_j + w_ij
icc2/3: x_ij = mu + alpha_i + beta_j + (ab)_ij + epsilon_ij
Code modifed from nipype algorithms.icc
https://github.com/nipy/nipype/blob/master/nipype/algorithms/icc.py
Args:
Y: The data Y are entered as a 'table' ie. subjects are in rows and repeated
measures in columns
icc_type: type of ICC to calculate. (ICC(2,1), ICC(2,k), ICC(3,1), ICC(3,k))
Returns:
ICC: (np.array) intraclass correlation coefficient
'''
[n, k] = Y.shape
# Degrees of Freedom
dfc = k - 1
dfe = (n - 1) * (k-1)
dfr = n - 1
# Sum Square Total
mean_Y = np.mean(Y)
SST = ((Y - mean_Y) ** 2).sum()
# create the design matrix for the different levels
x = np.kron(np.eye(k), np.ones((n, 1))) # sessions
x0 = np.tile(np.eye(n), (k, 1)) # subjects
X = np.hstack([x, x0])
# Sum Square Error
predicted_Y = np.dot(np.dot(np.dot(X, np.linalg.pinv(np.dot(X.T, X))),
X.T), Y.flatten('F'))
residuals = Y.flatten('F') - predicted_Y
SSE = (residuals ** 2).sum()
MSE = SSE / dfe
# Sum square column effect - between colums
SSC = ((np.mean(Y, 0) - mean_Y) ** 2).sum() * n
MSC = SSC / dfc # / n (without n in SPSS results)
# Sum Square subject effect - between rows/subjects
SSR = SST - SSC - SSE
MSR = SSR / dfr
if icc_type == 'icc1':
# ICC(2,1) = (mean square subject - mean square error) /
# (mean square subject + (k-1)*mean square error +
# k*(mean square columns - mean square error)/n)
# ICC = (MSR - MSRW) / (MSR + (k-1) * MSRW)
NotImplementedError("This method isn't implemented yet.")
elif icc_type == 'ICC(2,1)' or icc_type == 'ICC(2,k)':
# ICC(2,1) = (mean square subject - mean square error) /
# (mean square subject + (k-1)*mean square error +
# k*(mean square columns - mean square error)/n)
if icc_type == 'ICC(2,k)':
k = 1
ICC = (MSR - MSE) / (MSR + (k-1) * MSE + k * (MSC - MSE) / n)
elif icc_type == 'ICC(3,1)' or icc_type == 'ICC(3,k)':
# ICC(3,1) = (mean square subject - mean square error) /
# (mean square subject + (k-1)*mean square error)
if icc_type == 'ICC(3,k)':
k = 1
ICC = (MSR - MSE) / (MSR + (k-1) * MSE)
return ICC
我正在寻找计算 intraclass correlation (ICC) in Python. I haven't been able to find an existing module that has this feature. Is there an alternate name, or should I do it myself? I'm aware this question was asked a year ago 由另一个用户交叉验证,但没有回复。我正在寻找比较两个评分者之间的连续分数。
您可以在 ICC or Brain_Data.icc
找到实现ICC in R. These can be used from Python via the rpy2 包有多种实现。示例:
from rpy2.robjects import DataFrame, FloatVector, IntVector
from rpy2.robjects.packages import importr
from math import isclose
groups = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4,
4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8]
values = [1, 2, 0, 1, 1, 3, 3, 2, 3, 8, 1, 4, 6, 4, 3,
3, 6, 5, 5, 6, 7, 5, 6, 2, 8, 7, 7, 9, 9, 9, 9, 8]
r_icc = importr("ICC")
df = DataFrame({"groups": IntVector(groups),
"values": FloatVector(values)})
icc_res = r_icc.ICCbare("groups", "values", data=df)
icc_val = icc_res[0] # icc_val now holds the icc value
# check whether icc value equals reference value
print(isclose(icc_val, 0.728, abs_tol=0.001))
R package psych 具有类内相关性 (ICC) 的实现,可计算多种类型的变体,包括 ICC(1,1)、ICC(1,k)、ICC(2,1)、ICC (2,k)、ICC(3,1) 和 ICC(3,k) 以及其他指标。
This page 对不同变体进行了很好的比较,
您可以通过rpy2包使用R ICC功能。
示例:
- 首先在 R: 中安装
psych
和 lme4
install.packages("psych")
install.packages("lme4")
- 使用 rpy2: 在 Python 中计算 ICC 系数
import rpy2
from rpy2.robjects import IntVector, pandas2ri
from rpy2.robjects.packages import importr
psych = importr("psych")
values = rpy2.robjects.r.matrix(
IntVector(
[9, 2, 5, 8,
6, 1, 3, 2,
8, 4, 6, 8,
7, 1, 2, 6,
10, 5, 6, 9,
6, 2, 4, 7]),
ncol=4, byrow=True
)
icc = psych.ICC(values)
# Convert to Pandas DataFrame
icc_df = pandas2ri.rpy2py(icc[0])
结果:
type ICC F df1 df2 p lower bound upper bound
Single_raters_absolute ICC1 0.165783 1.794916 5.0 18.0 0.164720 -0.132910 0.722589
Single_random_raters ICC2 0.289790 11.026650 5.0 15.0 0.000135 0.018791 0.761107
Single_fixed_raters ICC3 0.714829 11.026650 5.0 15.0 0.000135 0.342447 0.945855
Average_raters_absolute ICC1k 0.442871 1.794916 5.0 18.0 0.164720 -0.884193 0.912427
Average_random_raters ICC2k 0.620080 11.026650 5.0 15.0 0.000135 0.071153 0.927240
Average_fixed_raters ICC3k 0.909311 11.026650 5.0 15.0 0.000135 0.675657 0.985891
pengouin 库以 6 种不同的方式计算 ICC,以及相关的置信度和 p 值。
您可以使用 pip install pingouin
或 conda install -c conda-forge pingouin
import pingouin as pg
data = pg.read_dataset('icc')
icc = pg.intraclass_corr(data=data, targets='Wine', raters='Judge',
ratings='Scores')
data.head()
| | Wine | Judge | Scores |
|---:|-------:|:--------|---------:|
| 0 | 1 | A | 1 |
| 1 | 2 | A | 1 |
| 2 | 3 | A | 3 |
| 3 | 4 | A | 6 |
| 4 | 5 | A | 6 |
| 5 | 6 | A | 7 |
| 6 | 7 | A | 8 |
| 7 | 8 | A | 9 |
| 8 | 1 | B | 2 |
| 9 | 2 | B | 3 |
icc
| | Type | Description | ICC | F | df1 | df2 | pval | CI95% |
|---:|:-------|:------------------------|------:|-------:|------:|------:|------------:|:-------------|
| 0 | ICC1 | Single raters absolute | 0.773 | 11.199 | 5 | 12 | 0.000346492 | [0.39, 0.96] |
| 1 | ICC2 | Single random raters | 0.783 | 27.966 | 5 | 10 | 1.42573e-05 | [0.25, 0.96] |
| 2 | ICC3 | Single fixed raters | 0.9 | 27.966 | 5 | 10 | 1.42573e-05 | [0.65, 0.98] |
| 3 | ICC1k | Average raters absolute | 0.911 | 11.199 | 5 | 12 | 0.000346492 | [0.65, 0.99] |
| 4 | ICC2k | Average random raters | 0.915 | 27.966 | 5 | 10 | 1.42573e-05 | [0.5, 0.99] |
| 5 | ICC3k | Average fixed raters | 0.964 | 27.966 | 5 | 10 | 1.42573e-05 | [0.85, 0.99] |
基于Brain_Data,我修改了代码以计算相关系数ICC(2,1)、ICC(2,k)、ICC(3,1)或ICC(3,k) ) 作为 table Y 的数据输入(行中的主题和列中的重复测量)。
import os
import numpy as np
from numpy import ones, kron, mean, eye, hstack, dot, tile
from numpy.linalg import pinv
def icc(Y, icc_type='ICC(2,1)'):
''' Calculate intraclass correlation coefficient
ICC Formulas are based on:
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in
assessing rater reliability. Psychological bulletin, 86(2), 420.
icc1: x_ij = mu + beta_j + w_ij
icc2/3: x_ij = mu + alpha_i + beta_j + (ab)_ij + epsilon_ij
Code modifed from nipype algorithms.icc
https://github.com/nipy/nipype/blob/master/nipype/algorithms/icc.py
Args:
Y: The data Y are entered as a 'table' ie. subjects are in rows and repeated
measures in columns
icc_type: type of ICC to calculate. (ICC(2,1), ICC(2,k), ICC(3,1), ICC(3,k))
Returns:
ICC: (np.array) intraclass correlation coefficient
'''
[n, k] = Y.shape
# Degrees of Freedom
dfc = k - 1
dfe = (n - 1) * (k-1)
dfr = n - 1
# Sum Square Total
mean_Y = np.mean(Y)
SST = ((Y - mean_Y) ** 2).sum()
# create the design matrix for the different levels
x = np.kron(np.eye(k), np.ones((n, 1))) # sessions
x0 = np.tile(np.eye(n), (k, 1)) # subjects
X = np.hstack([x, x0])
# Sum Square Error
predicted_Y = np.dot(np.dot(np.dot(X, np.linalg.pinv(np.dot(X.T, X))),
X.T), Y.flatten('F'))
residuals = Y.flatten('F') - predicted_Y
SSE = (residuals ** 2).sum()
MSE = SSE / dfe
# Sum square column effect - between colums
SSC = ((np.mean(Y, 0) - mean_Y) ** 2).sum() * n
MSC = SSC / dfc # / n (without n in SPSS results)
# Sum Square subject effect - between rows/subjects
SSR = SST - SSC - SSE
MSR = SSR / dfr
if icc_type == 'icc1':
# ICC(2,1) = (mean square subject - mean square error) /
# (mean square subject + (k-1)*mean square error +
# k*(mean square columns - mean square error)/n)
# ICC = (MSR - MSRW) / (MSR + (k-1) * MSRW)
NotImplementedError("This method isn't implemented yet.")
elif icc_type == 'ICC(2,1)' or icc_type == 'ICC(2,k)':
# ICC(2,1) = (mean square subject - mean square error) /
# (mean square subject + (k-1)*mean square error +
# k*(mean square columns - mean square error)/n)
if icc_type == 'ICC(2,k)':
k = 1
ICC = (MSR - MSE) / (MSR + (k-1) * MSE + k * (MSC - MSE) / n)
elif icc_type == 'ICC(3,1)' or icc_type == 'ICC(3,k)':
# ICC(3,1) = (mean square subject - mean square error) /
# (mean square subject + (k-1)*mean square error)
if icc_type == 'ICC(3,k)':
k = 1
ICC = (MSR - MSE) / (MSR + (k-1) * MSE)
return ICC