从 R 中的整数向量计算一些困难的指标
Computing a few difficult metrics from an integer vector in R
在某些情况下,我正在处理体育/篮球数据。以下向量适用于 1 场 NBA 比赛,包含主队在比赛中任何给定时刻领先或落后的分数。
dput(leads_vector)
c(0, 0, 0, 0, 0, 0, 0, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 4, 2,
5, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 8, 8, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 11, 11, 9, 9, 9, 9, 9, 9, 9, 9, 11,
11, 9, 9, 9, 11, 11, 11, 11, 12, 13, 13, 13, 13, 13, 13, 15,
14, 14, 13, 13, 13, 13, 11, 14, 14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16,
16, 13, 13, 11, 11, 11, 11, 11, 9, 9, 9, 7, 9, 9, 9, 10, 10,
11, 11, 11, 11, 11, 11, 13, 13, 13, 13, 13, 11, 11, 11, 11, 11,
12, 13, 13, 13, 13, 13, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 12, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 15, 15, 15, 13, 13, 13, 13, 15, 12, 12, 12, 9,
9, 9, 9, 9, 11, 11, 11, 11, 13, 13, 10, 10, 10, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 10, 8, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 9, 9, 11, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 12, 10, 12, 12, 12,
12, 14, 14, 14, 12, 12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 15,
16, 16, 16, 16, 14, 14, 11, 11, 11, 11, 11, 11, 9, 9, 9, 9, 9,
9, 9, 10, 11, 11, 9, 9, 9, 9, 7, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 2, 1, 1, 1,
3, 3, 3, 3, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 6, 6, 6, 6, 6,
6, 6, 6, 7, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 8, 8, 7, 7, 7,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 11, 11, 11, 11,
9, 9, 9, 9, 9, 9, 10, 11, 11, 11, 8, 11, 8, 10, 10, 11, 11, 11,
11, 11, 9, 11, 11, 11, 10, 10, 10, 12, 12, 12, 12, 13, 13, 16,
16, 16, 16, 17, 18, 19, 19, 19, 19, 19, 18, 18, 18, 20, 20, 20,
20, 20, 20, 20, 18, 18, 18, 16, 16, 16, 13, 13, 13, 11, 10, 10,
10, 10, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13)
这些向量始终以 0 开头,因为比赛以 0-0 平局开始。 leads_vector[100] 等于 14,这意味着主队在比赛中以 14 的优势获胜。请注意,向量中的数字会重复,因为在一场篮球比赛中,连续几场比赛的得分可能保持不变。
我想计算的 4 个指标是:
- 最大领先
- 平局次数
- 最长运行(一队连续得分)
- 领导变更
最大的领先优势很容易计算:
biggest_lead <- abs(max(leads_vector))
平局次数比较难计算:
times_tied <- sum(leads_vector[2:length(leads_vector)] == 0 & leads_vector[1:(length(leads_vector)-1)] != 0)
times_tied 检查向量中所有值为 0 的实例(得分并列),并且向量中的前一个值不为 0。这确保每个零序列都算作比分只有一次平手。
我不确定如何计算最长 运行。游戏中最长的运行是向量中最大的单调递增或递减序列。仅使用视力测试,我注意到在 leads_vector[38:65] 处有一个长 运行 8。
铅变化的数量也很难计算。它将等于此向量中导联从正变为负的次数。以下leads_vector:
c(3, -3, 2, 5, 4, 3, 0, 2, -3, -1, -4, -5, -2, 0, 1)
...将有 4 次领先变化(从 3 到 -3,从 -3 到 2,从 2 到 -3,以及从 -2 到 0 到 1)。
感谢任何帮助!
编辑 - 最长 运行 是此处难以计算的统计数据,但我正在努力。
EDIT2 - 如果我从 leads_vector 中删除重复值,我认为最长的 运行 会更容易计算。但我不能使用 duplicated() 函数,因为这会删除向量中不同位置的重复项。相反,我只想删除彼此相邻的重复值(获取 c(0, -2, 5, 3, 5, 8, 10, 11, 9, 11, 9, 11, ... ))
我发现了如何使用 sign() 和 diff() 函数计算铅变化。首先,我需要过滤掉 lead 等于 0 的值,因为这些不是我计算的 lead 变化,即使 R 的 sign() 函数具有不同的 (+)、(-) 和 0 值。我有这个:
lead_changes <- sum(diff(sign(leads_vector[leads_vector != 0]))) / 2
对于最长的 运行,我认为从这个开始,删除重复值,是一个好的开始:
lead_changes[c(TRUE, lead_changes[-1] != hL[-length(hLlead_changes])]
#biggest_lead
with(rle(leads_vector), max(abs(values)))
#number_ties
with(rle(leads_vector), sum(values == 0))
#longest_run
#lead_changes
length(rle(leads_vector[leads_vector != 0] < 0)$values)
计算最长运行:
compute_longest_run <- function(x) {
# Collapse repetitions
x_unique <- rle(x)$values
# Compute score change
score_change <- diff(x_unique)
# Need to compute sum of all subvectors with the same sign
run_side <- sign(score_change)
run_id <- c(1, cumsum(diff(run_side) != 0) + 1)
run_value <- tapply(score_change, run_id, sum)
max(abs(run_value))
}
compute_longest_run(leads_vector)
#> [1] 10
在某些情况下,我正在处理体育/篮球数据。以下向量适用于 1 场 NBA 比赛,包含主队在比赛中任何给定时刻领先或落后的分数。
dput(leads_vector)
c(0, 0, 0, 0, 0, 0, 0, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2,
-2, -2, -2, -2, -2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 4, 2,
5, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 8, 8, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 11, 11, 9, 9, 9, 9, 9, 9, 9, 9, 11,
11, 9, 9, 9, 11, 11, 11, 11, 12, 13, 13, 13, 13, 13, 13, 15,
14, 14, 13, 13, 13, 13, 11, 14, 14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16,
16, 13, 13, 11, 11, 11, 11, 11, 9, 9, 9, 7, 9, 9, 9, 10, 10,
11, 11, 11, 11, 11, 11, 13, 13, 13, 13, 13, 11, 11, 11, 11, 11,
12, 13, 13, 13, 13, 13, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 12, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 15, 15, 15, 13, 13, 13, 13, 15, 12, 12, 12, 9,
9, 9, 9, 9, 11, 11, 11, 11, 13, 13, 10, 10, 10, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 10, 8, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 9, 9, 11, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 12, 10, 12, 12, 12,
12, 14, 14, 14, 12, 12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 15,
16, 16, 16, 16, 14, 14, 11, 11, 11, 11, 11, 11, 9, 9, 9, 9, 9,
9, 9, 10, 11, 11, 9, 9, 9, 9, 7, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 3, 3, 2, 1, 1, 1,
3, 3, 3, 3, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 6, 6, 6, 6, 6,
6, 6, 6, 7, 8, 8, 8, 8, 8, 8, 8, 8, 10, 10, 10, 8, 8, 7, 7, 7,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 11, 11, 11, 11,
9, 9, 9, 9, 9, 9, 10, 11, 11, 11, 8, 11, 8, 10, 10, 11, 11, 11,
11, 11, 9, 11, 11, 11, 10, 10, 10, 12, 12, 12, 12, 13, 13, 16,
16, 16, 16, 17, 18, 19, 19, 19, 19, 19, 18, 18, 18, 20, 20, 20,
20, 20, 20, 20, 18, 18, 18, 16, 16, 16, 13, 13, 13, 11, 10, 10,
10, 10, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13)
这些向量始终以 0 开头,因为比赛以 0-0 平局开始。 leads_vector[100] 等于 14,这意味着主队在比赛中以 14 的优势获胜。请注意,向量中的数字会重复,因为在一场篮球比赛中,连续几场比赛的得分可能保持不变。
我想计算的 4 个指标是:
- 最大领先
- 平局次数
- 最长运行(一队连续得分)
- 领导变更
最大的领先优势很容易计算:
biggest_lead <- abs(max(leads_vector))
平局次数比较难计算:
times_tied <- sum(leads_vector[2:length(leads_vector)] == 0 & leads_vector[1:(length(leads_vector)-1)] != 0)
times_tied 检查向量中所有值为 0 的实例(得分并列),并且向量中的前一个值不为 0。这确保每个零序列都算作比分只有一次平手。
我不确定如何计算最长 运行。游戏中最长的运行是向量中最大的单调递增或递减序列。仅使用视力测试,我注意到在 leads_vector[38:65] 处有一个长 运行 8。
铅变化的数量也很难计算。它将等于此向量中导联从正变为负的次数。以下leads_vector:
c(3, -3, 2, 5, 4, 3, 0, 2, -3, -1, -4, -5, -2, 0, 1)
...将有 4 次领先变化(从 3 到 -3,从 -3 到 2,从 2 到 -3,以及从 -2 到 0 到 1)。
感谢任何帮助!
编辑 - 最长 运行 是此处难以计算的统计数据,但我正在努力。 EDIT2 - 如果我从 leads_vector 中删除重复值,我认为最长的 运行 会更容易计算。但我不能使用 duplicated() 函数,因为这会删除向量中不同位置的重复项。相反,我只想删除彼此相邻的重复值(获取 c(0, -2, 5, 3, 5, 8, 10, 11, 9, 11, 9, 11, ... ))
我发现了如何使用 sign() 和 diff() 函数计算铅变化。首先,我需要过滤掉 lead 等于 0 的值,因为这些不是我计算的 lead 变化,即使 R 的 sign() 函数具有不同的 (+)、(-) 和 0 值。我有这个:
lead_changes <- sum(diff(sign(leads_vector[leads_vector != 0]))) / 2
对于最长的 运行,我认为从这个开始,删除重复值,是一个好的开始:
lead_changes[c(TRUE, lead_changes[-1] != hL[-length(hLlead_changes])]
#biggest_lead
with(rle(leads_vector), max(abs(values)))
#number_ties
with(rle(leads_vector), sum(values == 0))
#longest_run
#lead_changes
length(rle(leads_vector[leads_vector != 0] < 0)$values)
计算最长运行:
compute_longest_run <- function(x) {
# Collapse repetitions
x_unique <- rle(x)$values
# Compute score change
score_change <- diff(x_unique)
# Need to compute sum of all subvectors with the same sign
run_side <- sign(score_change)
run_id <- c(1, cumsum(diff(run_side) != 0) + 1)
run_value <- tapply(score_change, run_id, sum)
max(abs(run_value))
}
compute_longest_run(leads_vector)
#> [1] 10