协方差矩阵 - R
Covariance Matrix - R
我有一个球员统计数据框架,我想做的是为 MB 统计数据创建球员之间的协方差矩阵,以了解哪些球员在一起表现良好,哪些球员通常会相互影响。
请注意,并非所有玩家都参加每场比赛。
我想要类似下面的内容,其中显然 'x' 是相关的协方差值。
Player.Name Damian Lillard C.J. McCollum Allen Crabbe Noah Vonleh etc, etc
1 Damian Lillard x x x x
2 C.J. McCollum x x x x
3 Allen Crabbe x x x x
4 Noah Vonleh x x x x
5 Ed Davis x x x x
6 Al-Farouq Aminu x x x x
7 Evan Turner x x x x
8 Maurice Harkless x x x x
9 Meyers Leonard x x x x
10 Mason Plumlee x x x x
11 Shabazz Napier x x x x
> df
Player.Name Tm MB DS Game
1 Damian Lillard POR 54.8 59.50 20161025
11 C.J. McCollum POR 30.9 32.50 20161025
16 Allen Crabbe POR 24.1 28.25 20161025
19 Noah Vonleh POR 14.2 15.25 20161025
22 Ed Davis POR 17.9 18.00 20161025
26 Al-Farouq Aminu POR 16.3 18.25 20161025
34 Evan Turner POR 20.5 19.25 20161025
64 Maurice Harkless POR 4.7 5.25 20161025
65 Meyers Leonard POR 2.7 2.25 20161025
68 Mason Plumlee POR 4.7 4.00 20161025
290 Maurice Harkless POR 35.6 35.75 20161027
295 Mason Plumlee POR 36.6 36.75 20161027
299 Damian Lillard POR 41.5 44.25 20161027
309 C.J. McCollum POR 26.8 27.50 20161027
318 Allen Crabbe POR 17.2 16.25 20161027
349 Noah Vonleh POR 5.0 4.75 20161027
358 Evan Turner POR 10.7 10.50 20161027
359 Ed Davis POR 5.6 5.50 20161027
364 Shabazz Napier POR 0.0 0.00 20161027
369 Al-Farouq Aminu POR 13.6 13.25 20161027
545 Damian Lillard POR 56.5 58.25 20161029
557 C.J. McCollum POR 49.5 51.25 20161029
610 Mason Plumlee POR 22.9 22.50 20161029
611 Allen Crabbe POR 22.6 22.75 20161029
637 Evan Turner POR 15.6 16.75 20161029
649 Al-Farouq Aminu POR 27.9 28.25 20161029
673 Ed Davis POR 8.9 9.50 20161029
704 Noah Vonleh POR 4.8 5.00 20161029
719 Maurice Harkless POR 9.6 11.00 20161029
723 Meyers Leonard POR 6.2 6.25 20161029
728 Shabazz Napier POR 0.0 0.00 20161029
数据
structure(list(PlayerName = c("Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Ed Davis", "Al-Farouq Aminu",
"Evan Turner", "Maurice Harkless", "Meyers Leonard", "Mason Plumlee",
"Maurice Harkless", "Mason Plumlee", "Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Evan Turner", "Ed Davis", "Shabazz Napier",
"Al-Farouq Aminu", "Damian Lillard", "C.J. McCollum", "Mason Plumlee",
"Allen Crabbe", "Evan Turner", "Al-Farouq Aminu", "Ed Davis",
"Noah Vonleh", "Maurice Harkless", "Meyers Leonard", "Shabazz Napier"
), TM = c("POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR"), MB = c(54.8, 30.9, 24.1,
14.2, 17.9, 16.3, 20.5, 4.7, 2.7, 4.7, 35.6, 36.6, 41.5, 26.8,
17.2, 5, 10.7, 5.6, 0, 13.6, 56.5, 49.5, 22.9, 22.6, 15.6, 27.9,
8.9, 4.8, 9.6, 6.2, 0), DS = c(59.5, 32.5, 28.25, 15.25, 18,
18.25, 19.25, 5.25, 2.25, 4, 35.75, 36.75, 44.25, 27.5, 16.25,
4.75, 10.5, 5.5, 0, 13.25, 58.25, 51.25, 22.5, 22.75, 16.75,
28.25, 9.5, 5, 11, 6.25, 0), Game = c(20161025L, 20161025L, 20161025L,
20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 20161025L,
20161025L, 20161027L, 20161027L, 20161027L, 20161027L, 20161027L,
20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L)), .Names = c("PlayerName",
"TM", "MB", "DS", "Game"), row.names = c(NA, -31L), class = "data.frame")
您可以使用cov()
函数来实现,例如:
cov_mat <- cov(t(x[,3:4]))
rownames(cov_mat) <- x$PlayerName
colnames(cov_mat) <- x$PlayerName
> cov_mat[1:3,1:3]
Damian Lillard C.J. McCollum Allen Crabbe
Damian Lillard 11.0450 3.76 9.75250
C.J. McCollum 3.7600 1.28 3.32000
Allen Crabbe 9.7525 3.32 8.61125
如果您想改为计算相关性,只需将 cov()
换成 cor()
。
我觉得你首先需要做的是reshape
数据,这样每一行都是一个游戏,每一列是一个玩家的游戏MB
。假设我们的数据在dat
:
dat <- dat[,-c(2,4)] #remove team name and DS
#names left in data.frame
names(dat)
[1] "PlayerName" "MB" "Game"
#reshape from long to wide
dat.wide <- reshape(dat, direction = 'wide',idvar = 'Game',
timevar = 'PlayerName')
dat.wide[1:4, 1:4]
Game MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe
1 20161025 54.8 30.9 24.1
11 20161027 41.5 26.8 17.2
21 20161029 56.5 49.5 22.6
#compute using cov function
cov_m <- cov(dat.wide[,-1], use = 'pairwise.complete')
cov_m[1:4,1:4]
MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe MB.Noah Vonleh
MB.Damian Lillard 67.46333 71.10833 28.370 17.23
MB.C.J. McCollum 71.10833 146.34333 20.495 -23.61
MB.Allen Crabbe 28.37000 20.49500 13.170 12.75
MB.Noah Vonleh 17.23000 -23.61000 12.750 28.84
我有一个球员统计数据框架,我想做的是为 MB 统计数据创建球员之间的协方差矩阵,以了解哪些球员在一起表现良好,哪些球员通常会相互影响。
请注意,并非所有玩家都参加每场比赛。
我想要类似下面的内容,其中显然 'x' 是相关的协方差值。
Player.Name Damian Lillard C.J. McCollum Allen Crabbe Noah Vonleh etc, etc
1 Damian Lillard x x x x
2 C.J. McCollum x x x x
3 Allen Crabbe x x x x
4 Noah Vonleh x x x x
5 Ed Davis x x x x
6 Al-Farouq Aminu x x x x
7 Evan Turner x x x x
8 Maurice Harkless x x x x
9 Meyers Leonard x x x x
10 Mason Plumlee x x x x
11 Shabazz Napier x x x x
> df
Player.Name Tm MB DS Game
1 Damian Lillard POR 54.8 59.50 20161025
11 C.J. McCollum POR 30.9 32.50 20161025
16 Allen Crabbe POR 24.1 28.25 20161025
19 Noah Vonleh POR 14.2 15.25 20161025
22 Ed Davis POR 17.9 18.00 20161025
26 Al-Farouq Aminu POR 16.3 18.25 20161025
34 Evan Turner POR 20.5 19.25 20161025
64 Maurice Harkless POR 4.7 5.25 20161025
65 Meyers Leonard POR 2.7 2.25 20161025
68 Mason Plumlee POR 4.7 4.00 20161025
290 Maurice Harkless POR 35.6 35.75 20161027
295 Mason Plumlee POR 36.6 36.75 20161027
299 Damian Lillard POR 41.5 44.25 20161027
309 C.J. McCollum POR 26.8 27.50 20161027
318 Allen Crabbe POR 17.2 16.25 20161027
349 Noah Vonleh POR 5.0 4.75 20161027
358 Evan Turner POR 10.7 10.50 20161027
359 Ed Davis POR 5.6 5.50 20161027
364 Shabazz Napier POR 0.0 0.00 20161027
369 Al-Farouq Aminu POR 13.6 13.25 20161027
545 Damian Lillard POR 56.5 58.25 20161029
557 C.J. McCollum POR 49.5 51.25 20161029
610 Mason Plumlee POR 22.9 22.50 20161029
611 Allen Crabbe POR 22.6 22.75 20161029
637 Evan Turner POR 15.6 16.75 20161029
649 Al-Farouq Aminu POR 27.9 28.25 20161029
673 Ed Davis POR 8.9 9.50 20161029
704 Noah Vonleh POR 4.8 5.00 20161029
719 Maurice Harkless POR 9.6 11.00 20161029
723 Meyers Leonard POR 6.2 6.25 20161029
728 Shabazz Napier POR 0.0 0.00 20161029
数据
structure(list(PlayerName = c("Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Ed Davis", "Al-Farouq Aminu",
"Evan Turner", "Maurice Harkless", "Meyers Leonard", "Mason Plumlee",
"Maurice Harkless", "Mason Plumlee", "Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Evan Turner", "Ed Davis", "Shabazz Napier",
"Al-Farouq Aminu", "Damian Lillard", "C.J. McCollum", "Mason Plumlee",
"Allen Crabbe", "Evan Turner", "Al-Farouq Aminu", "Ed Davis",
"Noah Vonleh", "Maurice Harkless", "Meyers Leonard", "Shabazz Napier"
), TM = c("POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR"), MB = c(54.8, 30.9, 24.1,
14.2, 17.9, 16.3, 20.5, 4.7, 2.7, 4.7, 35.6, 36.6, 41.5, 26.8,
17.2, 5, 10.7, 5.6, 0, 13.6, 56.5, 49.5, 22.9, 22.6, 15.6, 27.9,
8.9, 4.8, 9.6, 6.2, 0), DS = c(59.5, 32.5, 28.25, 15.25, 18,
18.25, 19.25, 5.25, 2.25, 4, 35.75, 36.75, 44.25, 27.5, 16.25,
4.75, 10.5, 5.5, 0, 13.25, 58.25, 51.25, 22.5, 22.75, 16.75,
28.25, 9.5, 5, 11, 6.25, 0), Game = c(20161025L, 20161025L, 20161025L,
20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 20161025L,
20161025L, 20161027L, 20161027L, 20161027L, 20161027L, 20161027L,
20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L)), .Names = c("PlayerName",
"TM", "MB", "DS", "Game"), row.names = c(NA, -31L), class = "data.frame")
您可以使用cov()
函数来实现,例如:
cov_mat <- cov(t(x[,3:4]))
rownames(cov_mat) <- x$PlayerName
colnames(cov_mat) <- x$PlayerName
> cov_mat[1:3,1:3]
Damian Lillard C.J. McCollum Allen Crabbe
Damian Lillard 11.0450 3.76 9.75250
C.J. McCollum 3.7600 1.28 3.32000
Allen Crabbe 9.7525 3.32 8.61125
如果您想改为计算相关性,只需将 cov()
换成 cor()
。
我觉得你首先需要做的是reshape
数据,这样每一行都是一个游戏,每一列是一个玩家的游戏MB
。假设我们的数据在dat
:
dat <- dat[,-c(2,4)] #remove team name and DS
#names left in data.frame
names(dat)
[1] "PlayerName" "MB" "Game"
#reshape from long to wide
dat.wide <- reshape(dat, direction = 'wide',idvar = 'Game',
timevar = 'PlayerName')
dat.wide[1:4, 1:4]
Game MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe
1 20161025 54.8 30.9 24.1
11 20161027 41.5 26.8 17.2
21 20161029 56.5 49.5 22.6
#compute using cov function
cov_m <- cov(dat.wide[,-1], use = 'pairwise.complete')
cov_m[1:4,1:4]
MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe MB.Noah Vonleh
MB.Damian Lillard 67.46333 71.10833 28.370 17.23
MB.C.J. McCollum 71.10833 146.34333 20.495 -23.61
MB.Allen Crabbe 28.37000 20.49500 13.170 12.75
MB.Noah Vonleh 17.23000 -23.61000 12.750 28.84