R 中的 COUNTIF 具有多重限制
COUNTIF in R with multiple restrictions
我有来自 retrosheet.org 的事件文件数据。这是关于棒球比赛的数据,其格式使得每次观察都是对棒球赛季每场比赛中每场比赛的描述(包括比赛、球员和比赛的参考变量)。
> str(e.2015.1990)
'data.frame': 4813807 obs. of 42 variables:
$ GAME.ID : Factor w/ 60464 levels "ANA201504100",..: 1 1 1 1 1 1 1 1 1 1 ...
$ INNING : num 1 1 1 1 1 1 1 1 1 2 ...
$ BATTING.TEAM : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 2 1 ...
$ OUTS : int 0 1 2 2 2 2 0 1 2 0 ...
$ BATTER : Factor w/ 5107 levels "abrej003","ackld001",..: 73 167 33 120 163 100 34 256 200 209 ...
$ BATTER.HAND : Factor w/ 2 levels "L","R": 2 1 2 1 2 1 1 2 2 2 ...
$ RES.BATTER : Factor w/ 5107 levels "abrej003","ackld001",..: 73 167 33 120 163 100 34 256 200 209 ...
$ RES.BATTER.HAND : Factor w/ 2 levels "L","R": 2 1 2 1 2 1 1 2 2 2 ...
$ PITCHER : Factor w/ 3481 levels "abadf001","albem001",..: 187 187 187 187 187 187 204 204 204 187 ...
$ PITCHER.HAND : Factor w/ 2 levels "L","R": 1 1 1 1 1 1 1 1 1 1 ...
$ RES.PITCHER : Factor w/ 3481 levels "abadf001","albem001",..: 187 187 187 187 187 187 204 204 204 187 ...
$ RES.PITCHER.HAND : Factor w/ 2 levels "L","R": 1 1 1 1 1 1 1 1 1 1 ...
$ FIRST.RUNNER : Factor w/ 4369 levels "","abrej003",..: 1 1 1 1 104 140 1 1 1 1 ...
$ SECOND.RUNNER : Factor w/ 4048 levels "","abrej003",..: 1 1 1 26 1 90 1 1 1 1 ...
$ THIRD.RUNNER : Factor w/ 3729 levels "","ackld001",..: 1 1 1 1 1 1 1 1 1 1 ...
$ EVENT.TEXT : chr "63/G" "6/P" "D8/L+" "S9/G.2-H" ...
$ EVENT.TYPE : Factor w/ 21 levels "2","3","4","5",..: 1 1 19 18 18 1 1 1 1 1 ...
$ AB.FLAG : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ HIT.VALUE : int 1 1 3 2 2 1 1 1 1 1 ...
$ SH.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ SF.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ DOUBLE.PLAY.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ TRIPLE.PLAY.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ RBI.ON.PLAY : num 0 0 0 1 0 0 0 0 0 0 ...
$ BATTED.BALL.TYPE : Factor w/ 5 levels "","F","G","L",..: 3 5 4 3 4 5 3 3 5 4 ...
$ BATTER.DEST : int 0 0 2 1 1 0 0 0 0 0 ...
$ RUNNER.ON.1ST.DEST : int 0 0 0 0 2 1 0 0 0 0 ...
$ RUNNER.ON.2ND.DEST : int 0 0 0 4 0 2 0 0 0 0 ...
$ RUNNER.ON.3RD.DEST : int 0 0 0 0 0 0 0 0 0 0 ...
$ SB.FOR.RUNNER.ON.1ST.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ SB.FOR.RUNNER.ON.2ND.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ SB.FOR.RUNNER.ON.3RD.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ CS.FOR.RUNNER.ON.1ST.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ CS.FOR.RUNNER.ON.2ND.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ CS.FOR.RUNNER.ON.3RD.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ PO.FOR.RUNNER.ON.1ST.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ PO.FOR.RUNNER.ON.2ND.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ PO.FOR.RUNNER.ON.3RD.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.1ST: Factor w/ 3433 levels "","albua001",..: 1 1 1 1 161 161 1 1 1 1 ...
$ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.2ND: Factor w/ 3408 levels "","abadf001",..: 1 1 1 133 1 133 1 1 1 1 ...
$ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.3RD: Factor w/ 3337 levels "","abadf001",..: 1 1 1 1 1 1 1 1 1 1 ...
$ EVENT.NUM : Factor w/ 177 levels "1","10","100",..: 1 90 101 112 123 134 145 156 167 2 ...
据此,我想计算每个玩家每场比赛的比赛总分。我想格式化一个数据框,这样每个观察结果都是对一名球员在本赛季一场比赛中表现的描述,每场比赛中的每个球员都构成了全部观察结果。
我创建了一个包含两列的新数据库,GAME.ID 和 PLAYER.ID,这样每个游戏中的每个 STARTER 都构成了全部观察结果。
> str(k.2015.1990)
'data.frame': 1146866 obs. of 2 variables:
$ GAME.ID : Factor w/ 60464 levels "ANA201504100",..: 1 2 3 4 5 6 7 8 9 10 ...
$ PLAYER.ID: Factor w/ 4699 levels "altuj001","bettm001",..: 11 11 11 12 14 12 12 24 24 24 ...
我认为我接下来需要做的是创建额外的向量(针对我要计算的每个统计数据),以便对所述向量的每次观察都创建我的事件数据的唯一子集,定义如下:
e.2015.1990$GAME.ID = k.2015.1990$GAME.ID
e.2015.1990$PLAYER.ID = k.2015.1990$PLAYER.ID
然后根据该子集计算该统计数据。我知道如何在 R 中创建向量和子集,但不知道如何为每个观察创建唯一子集的向量。我想我需要使用
function(x)
做这个;但是,我是 R 的新手,没有使用此功能的经验。
为了方便起见,我将尝试制作一个可重现的示例。在此示例中,目标是计算天使队 2015 年常规赛前两场比赛中每位球员的总命中率。
我制作了事件文件数据的一个子集,其中包含与这两场比赛相对应的 156 个观察结果。为了简单起见,我只包含了变量 GAME.ID、BATTER 和 HIT.VALUE.
GAME.ID BATTER HIT.VALUE
1 ANA201504100 escoa003 1
2 ANA201504100 mousm001 1
3 ANA201504100 cainl001 3
4 ANA201504100 hosme001 2
5 ANA201504100 morak001 2
6 ANA201504100 gorda001 1
7 ANA201504100 calhk001 1
8 ANA201504100 troum001 1
9 ANA201504100 pujoa001 1
10 ANA201504100 riosa002 1
11 ANA201504100 peres002 1
12 ANA201504100 infao001 1
13 ANA201504100 freed001 1
14 ANA201504100 cronc002 1
15 ANA201504100 aybae001 1
16 ANA201504100 escoa003 1
17 ANA201504100 mousm001 1
18 ANA201504100 cainl001 1
19 ANA201504100 hosme001 1
20 ANA201504100 morak001 1
21 ANA201504100 iannc001 1
22 ANA201504100 cowgc001 2
23 ANA201504100 giavj001 1
24 ANA201504100 calhk001 3
25 ANA201504100 troum001 1
26 ANA201504100 pujoa001 1
27 ANA201504100 gorda001 1
28 ANA201504100 riosa002 1
29 ANA201504100 peres002 1
30 ANA201504100 freed001 2
31 ANA201504100 cronc002 1
32 ANA201504100 aybae001 1
33 ANA201504100 iannc001 1
34 ANA201504100 infao001 1
35 ANA201504100 escoa003 2
36 ANA201504100 mousm001 1
37 ANA201504100 cainl001 2
38 ANA201504100 hosme001 1
39 ANA201504100 cowgc001 1
40 ANA201504100 giavj001 1
41 ANA201504100 calhk001 1
42 ANA201504100 morak001 5
43 ANA201504100 gorda001 1
44 ANA201504100 riosa002 1
45 ANA201504100 peres002 1
46 ANA201504100 troum001 2
47 ANA201504100 pujoa001 1
48 ANA201504100 freed001 5
49 ANA201504100 cronc002 1
50 ANA201504100 infao001 1
51 ANA201504100 escoa003 1
52 ANA201504100 mousm001 2
53 ANA201504100 cainl001 1
54 ANA201504100 cainl001 1
55 ANA201504100 aybae001 1
56 ANA201504100 iannc001 1
57 ANA201504100 joycm001 3
58 ANA201504100 giavj001 1
59 ANA201504100 hosme001 1
60 ANA201504100 morak001 1
61 ANA201504100 gorda001 1
62 ANA201504100 riosa002 1
63 ANA201504100 riosa002 1
64 ANA201504100 calhk001 1
65 ANA201504100 troum001 2
66 ANA201504100 pujoa001 1
67 ANA201504100 freed001 1
68 ANA201504100 peres002 2
69 ANA201504100 infao001 2
70 ANA201504100 escoa003 1
71 ANA201504100 mousm001 1
72 ANA201504100 cainl001 1
73 ANA201504100 hosme001 1
74 ANA201504100 morak001 1
75 ANA201504100 cronc002 1
76 ANA201504100 aybae001 1
77 ANA201504100 iannc001 1
78 ANA201504100 joycm001 1
79 ANA201504110 escoa003 1
80 ANA201504110 mousm001 1
81 ANA201504110 cainl001 1
82 ANA201504110 hosme001 1
83 ANA201504110 calhk001 5
84 ANA201504110 troum001 2
85 ANA201504110 pujoa001 1
86 ANA201504110 joycm001 1
87 ANA201504110 freed001 1
88 ANA201504110 morak001 1
89 ANA201504110 gorda001 1
90 ANA201504110 riosa002 1
91 ANA201504110 aybae001 2
92 ANA201504110 navae001 1
93 ANA201504110 buted001 1
94 ANA201504110 giavj001 1
95 ANA201504110 peres002 1
96 ANA201504110 infao001 1
97 ANA201504110 escoa003 1
98 ANA201504110 giavj001 1
99 ANA201504110 calhk001 1
100 ANA201504110 troum001 1
101 ANA201504110 mousm001 5
102 ANA201504110 cainl001 2
103 ANA201504110 hosme001 1
104 ANA201504110 hosme001 1
105 ANA201504110 morak001 3
106 ANA201504110 gorda001 1
107 ANA201504110 riosa002 2
108 ANA201504110 peres002 5
109 ANA201504110 infao001 2
110 ANA201504110 escoa003 1
111 ANA201504110 pujoa001 1
112 ANA201504110 joycm001 1
113 ANA201504110 freed001 1
114 ANA201504110 mousm001 1
115 ANA201504110 cainl001 1
116 ANA201504110 hosme001 2
117 ANA201504110 morak001 2
118 ANA201504110 gorda001 1
119 ANA201504110 riosa002 1
120 ANA201504110 aybae001 1
121 ANA201504110 navae001 1
122 ANA201504110 buted001 2
123 ANA201504110 giavj001 1
124 ANA201504110 calhk001 3
125 ANA201504110 troum001 2
126 ANA201504110 pujoa001 1
127 ANA201504110 riosa002 1
128 ANA201504110 peres002 2
129 ANA201504110 infao001 1
130 ANA201504110 escoa003 2
131 ANA201504110 mousm001 1
132 ANA201504110 joycm001 1
133 ANA201504110 freed001 1
134 ANA201504110 aybae001 1
135 ANA201504110 cainl001 1
136 ANA201504110 hosme001 1
137 ANA201504110 morak001 2
138 ANA201504110 gorda001 1
139 ANA201504110 riosa002 1
140 ANA201504110 navae001 1
141 ANA201504110 iannc001 1
142 ANA201504110 giavj001 1
143 ANA201504110 peres002 1
144 ANA201504110 infao001 1
145 ANA201504110 escoa003 1
146 ANA201504110 calhk001 1
147 ANA201504110 troum001 1
148 ANA201504110 pujoa001 1
149 ANA201504110 mousm001 2
150 ANA201504110 cainl001 1
151 ANA201504110 hosme001 1
152 ANA201504110 morak001 1
153 ANA201504110 gorda001 1
154 ANA201504110 joycm001 1
155 ANA201504110 freed001 1
156 ANA201504110 aybae001 1
我还制作了新数据库的子集,对应这两场比赛的40名先发球员。
GAME.ID PLAYER.ID
1 ANA201504100 escoa003
60465 ANA201504100 mousm001
120929 ANA201504100 cainl001
181393 ANA201504100 hosme001
241857 ANA201504100 morak001
302321 ANA201504100 gorda001
362785 ANA201504100 riosa002
423249 ANA201504100 peres002
483713 ANA201504100 infao001
1117610 ANA201504100 vargj001
573434 ANA201504100 calhk001
633898 ANA201504100 troum001
694362 ANA201504100 pujoa001
754826 ANA201504100 freed001
815290 ANA201504100 cronc002
875754 ANA201504100 aybae001
936218 ANA201504100 iannc001
996682 ANA201504100 cowgc001
1057146 ANA201504100 giavj001
1117613 ANA201504100 santh001
2 ANA201504110 escoa003
60466 ANA201504110 mousm001
120930 ANA201504110 cainl001
181394 ANA201504110 hosme001
241858 ANA201504110 morak001
302322 ANA201504110 gorda001
362786 ANA201504110 riosa002
423250 ANA201504110 peres002
483714 ANA201504110 infao001
2100000 ANA201504110 guthj001
573435 ANA201504110 calhk001
633899 ANA201504110 troum001
694363 ANA201504110 pujoa001
754827 ANA201504110 joycm001
815291 ANA201504110 freed001
875755 ANA201504110 aybae001
936219 ANA201504110 navae001
996683 ANA201504110 buted001
1057147 ANA201504110 giavj001
2100001 ANA201504110 weavj003
我认为应该有一种方法可以向后一个数据库添加一列,以便每个观察都引用其行中的 GAME.ID 和 PLAYER.ID 条目,搜索前一个数据库以隔离那些GAME.ID = GAME.ID 和 PLAYER.ID = BATTER 的观测值,计算 HIT.VALUE > 1(1 = 默认,2 = 单,3 = 双, 4 = 三重, 5 = 本垒打), 然后 returns 计入观察。在 excel 中,这可以通过 CountIf() 函数完成,我可以轻松复制向量的长度。不过,我不知道如何在 R 中做到这一点。
我想这可能就是您要找的。它按 GAME.ID
和 BATTER
对倒数第二个数据集进行分组,然后计算每组 >1 的命中数。
library(data.table)
dt<-setDT(df)[, list(count_hits = sum(HIT.VALUE>1)),by=c("GAME.ID","BATTER")]
head(dt)
GAME.ID BATTER count_hits
1: ANA201504100 escoa003 1
2: ANA201504100 mousm001 1
3: ANA201504100 cainl001 2
4: ANA201504100 hosme001 1
5: ANA201504100 morak001 2
6: ANA201504100 gorda001 0
base R 中的另一个选项是:
res<-aggregate(x=list(count_hits=df$HIT.VALUE), by=list(GAME.ID=df$GAME.ID,BATTER=df$BATTER), FUN = function(x) sum(x>1) )
head(res)
GAME.ID BATTER count_hits
1 ANA201504100 aybae001 0
2 ANA201504110 aybae001 1
3 ANA201504110 buted001 1
4 ANA201504100 cainl001 2
5 ANA201504110 cainl001 1
6 ANA201504100 calhk001 1
我有来自 retrosheet.org 的事件文件数据。这是关于棒球比赛的数据,其格式使得每次观察都是对棒球赛季每场比赛中每场比赛的描述(包括比赛、球员和比赛的参考变量)。
> str(e.2015.1990)
'data.frame': 4813807 obs. of 42 variables:
$ GAME.ID : Factor w/ 60464 levels "ANA201504100",..: 1 1 1 1 1 1 1 1 1 1 ...
$ INNING : num 1 1 1 1 1 1 1 1 1 2 ...
$ BATTING.TEAM : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 2 1 ...
$ OUTS : int 0 1 2 2 2 2 0 1 2 0 ...
$ BATTER : Factor w/ 5107 levels "abrej003","ackld001",..: 73 167 33 120 163 100 34 256 200 209 ...
$ BATTER.HAND : Factor w/ 2 levels "L","R": 2 1 2 1 2 1 1 2 2 2 ...
$ RES.BATTER : Factor w/ 5107 levels "abrej003","ackld001",..: 73 167 33 120 163 100 34 256 200 209 ...
$ RES.BATTER.HAND : Factor w/ 2 levels "L","R": 2 1 2 1 2 1 1 2 2 2 ...
$ PITCHER : Factor w/ 3481 levels "abadf001","albem001",..: 187 187 187 187 187 187 204 204 204 187 ...
$ PITCHER.HAND : Factor w/ 2 levels "L","R": 1 1 1 1 1 1 1 1 1 1 ...
$ RES.PITCHER : Factor w/ 3481 levels "abadf001","albem001",..: 187 187 187 187 187 187 204 204 204 187 ...
$ RES.PITCHER.HAND : Factor w/ 2 levels "L","R": 1 1 1 1 1 1 1 1 1 1 ...
$ FIRST.RUNNER : Factor w/ 4369 levels "","abrej003",..: 1 1 1 1 104 140 1 1 1 1 ...
$ SECOND.RUNNER : Factor w/ 4048 levels "","abrej003",..: 1 1 1 26 1 90 1 1 1 1 ...
$ THIRD.RUNNER : Factor w/ 3729 levels "","ackld001",..: 1 1 1 1 1 1 1 1 1 1 ...
$ EVENT.TEXT : chr "63/G" "6/P" "D8/L+" "S9/G.2-H" ...
$ EVENT.TYPE : Factor w/ 21 levels "2","3","4","5",..: 1 1 19 18 18 1 1 1 1 1 ...
$ AB.FLAG : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ HIT.VALUE : int 1 1 3 2 2 1 1 1 1 1 ...
$ SH.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ SF.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ DOUBLE.PLAY.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ TRIPLE.PLAY.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ RBI.ON.PLAY : num 0 0 0 1 0 0 0 0 0 0 ...
$ BATTED.BALL.TYPE : Factor w/ 5 levels "","F","G","L",..: 3 5 4 3 4 5 3 3 5 4 ...
$ BATTER.DEST : int 0 0 2 1 1 0 0 0 0 0 ...
$ RUNNER.ON.1ST.DEST : int 0 0 0 0 2 1 0 0 0 0 ...
$ RUNNER.ON.2ND.DEST : int 0 0 0 4 0 2 0 0 0 0 ...
$ RUNNER.ON.3RD.DEST : int 0 0 0 0 0 0 0 0 0 0 ...
$ SB.FOR.RUNNER.ON.1ST.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ SB.FOR.RUNNER.ON.2ND.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ SB.FOR.RUNNER.ON.3RD.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ CS.FOR.RUNNER.ON.1ST.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ CS.FOR.RUNNER.ON.2ND.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ CS.FOR.RUNNER.ON.3RD.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ PO.FOR.RUNNER.ON.1ST.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ PO.FOR.RUNNER.ON.2ND.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ PO.FOR.RUNNER.ON.3RD.FLAG : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.1ST: Factor w/ 3433 levels "","albua001",..: 1 1 1 1 161 161 1 1 1 1 ...
$ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.2ND: Factor w/ 3408 levels "","abadf001",..: 1 1 1 133 1 133 1 1 1 1 ...
$ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.3RD: Factor w/ 3337 levels "","abadf001",..: 1 1 1 1 1 1 1 1 1 1 ...
$ EVENT.NUM : Factor w/ 177 levels "1","10","100",..: 1 90 101 112 123 134 145 156 167 2 ...
据此,我想计算每个玩家每场比赛的比赛总分。我想格式化一个数据框,这样每个观察结果都是对一名球员在本赛季一场比赛中表现的描述,每场比赛中的每个球员都构成了全部观察结果。
我创建了一个包含两列的新数据库,GAME.ID 和 PLAYER.ID,这样每个游戏中的每个 STARTER 都构成了全部观察结果。
> str(k.2015.1990)
'data.frame': 1146866 obs. of 2 variables:
$ GAME.ID : Factor w/ 60464 levels "ANA201504100",..: 1 2 3 4 5 6 7 8 9 10 ...
$ PLAYER.ID: Factor w/ 4699 levels "altuj001","bettm001",..: 11 11 11 12 14 12 12 24 24 24 ...
我认为我接下来需要做的是创建额外的向量(针对我要计算的每个统计数据),以便对所述向量的每次观察都创建我的事件数据的唯一子集,定义如下:
e.2015.1990$GAME.ID = k.2015.1990$GAME.ID
e.2015.1990$PLAYER.ID = k.2015.1990$PLAYER.ID
然后根据该子集计算该统计数据。我知道如何在 R 中创建向量和子集,但不知道如何为每个观察创建唯一子集的向量。我想我需要使用
function(x)
做这个;但是,我是 R 的新手,没有使用此功能的经验。
为了方便起见,我将尝试制作一个可重现的示例。在此示例中,目标是计算天使队 2015 年常规赛前两场比赛中每位球员的总命中率。
我制作了事件文件数据的一个子集,其中包含与这两场比赛相对应的 156 个观察结果。为了简单起见,我只包含了变量 GAME.ID、BATTER 和 HIT.VALUE.
GAME.ID BATTER HIT.VALUE
1 ANA201504100 escoa003 1
2 ANA201504100 mousm001 1
3 ANA201504100 cainl001 3
4 ANA201504100 hosme001 2
5 ANA201504100 morak001 2
6 ANA201504100 gorda001 1
7 ANA201504100 calhk001 1
8 ANA201504100 troum001 1
9 ANA201504100 pujoa001 1
10 ANA201504100 riosa002 1
11 ANA201504100 peres002 1
12 ANA201504100 infao001 1
13 ANA201504100 freed001 1
14 ANA201504100 cronc002 1
15 ANA201504100 aybae001 1
16 ANA201504100 escoa003 1
17 ANA201504100 mousm001 1
18 ANA201504100 cainl001 1
19 ANA201504100 hosme001 1
20 ANA201504100 morak001 1
21 ANA201504100 iannc001 1
22 ANA201504100 cowgc001 2
23 ANA201504100 giavj001 1
24 ANA201504100 calhk001 3
25 ANA201504100 troum001 1
26 ANA201504100 pujoa001 1
27 ANA201504100 gorda001 1
28 ANA201504100 riosa002 1
29 ANA201504100 peres002 1
30 ANA201504100 freed001 2
31 ANA201504100 cronc002 1
32 ANA201504100 aybae001 1
33 ANA201504100 iannc001 1
34 ANA201504100 infao001 1
35 ANA201504100 escoa003 2
36 ANA201504100 mousm001 1
37 ANA201504100 cainl001 2
38 ANA201504100 hosme001 1
39 ANA201504100 cowgc001 1
40 ANA201504100 giavj001 1
41 ANA201504100 calhk001 1
42 ANA201504100 morak001 5
43 ANA201504100 gorda001 1
44 ANA201504100 riosa002 1
45 ANA201504100 peres002 1
46 ANA201504100 troum001 2
47 ANA201504100 pujoa001 1
48 ANA201504100 freed001 5
49 ANA201504100 cronc002 1
50 ANA201504100 infao001 1
51 ANA201504100 escoa003 1
52 ANA201504100 mousm001 2
53 ANA201504100 cainl001 1
54 ANA201504100 cainl001 1
55 ANA201504100 aybae001 1
56 ANA201504100 iannc001 1
57 ANA201504100 joycm001 3
58 ANA201504100 giavj001 1
59 ANA201504100 hosme001 1
60 ANA201504100 morak001 1
61 ANA201504100 gorda001 1
62 ANA201504100 riosa002 1
63 ANA201504100 riosa002 1
64 ANA201504100 calhk001 1
65 ANA201504100 troum001 2
66 ANA201504100 pujoa001 1
67 ANA201504100 freed001 1
68 ANA201504100 peres002 2
69 ANA201504100 infao001 2
70 ANA201504100 escoa003 1
71 ANA201504100 mousm001 1
72 ANA201504100 cainl001 1
73 ANA201504100 hosme001 1
74 ANA201504100 morak001 1
75 ANA201504100 cronc002 1
76 ANA201504100 aybae001 1
77 ANA201504100 iannc001 1
78 ANA201504100 joycm001 1
79 ANA201504110 escoa003 1
80 ANA201504110 mousm001 1
81 ANA201504110 cainl001 1
82 ANA201504110 hosme001 1
83 ANA201504110 calhk001 5
84 ANA201504110 troum001 2
85 ANA201504110 pujoa001 1
86 ANA201504110 joycm001 1
87 ANA201504110 freed001 1
88 ANA201504110 morak001 1
89 ANA201504110 gorda001 1
90 ANA201504110 riosa002 1
91 ANA201504110 aybae001 2
92 ANA201504110 navae001 1
93 ANA201504110 buted001 1
94 ANA201504110 giavj001 1
95 ANA201504110 peres002 1
96 ANA201504110 infao001 1
97 ANA201504110 escoa003 1
98 ANA201504110 giavj001 1
99 ANA201504110 calhk001 1
100 ANA201504110 troum001 1
101 ANA201504110 mousm001 5
102 ANA201504110 cainl001 2
103 ANA201504110 hosme001 1
104 ANA201504110 hosme001 1
105 ANA201504110 morak001 3
106 ANA201504110 gorda001 1
107 ANA201504110 riosa002 2
108 ANA201504110 peres002 5
109 ANA201504110 infao001 2
110 ANA201504110 escoa003 1
111 ANA201504110 pujoa001 1
112 ANA201504110 joycm001 1
113 ANA201504110 freed001 1
114 ANA201504110 mousm001 1
115 ANA201504110 cainl001 1
116 ANA201504110 hosme001 2
117 ANA201504110 morak001 2
118 ANA201504110 gorda001 1
119 ANA201504110 riosa002 1
120 ANA201504110 aybae001 1
121 ANA201504110 navae001 1
122 ANA201504110 buted001 2
123 ANA201504110 giavj001 1
124 ANA201504110 calhk001 3
125 ANA201504110 troum001 2
126 ANA201504110 pujoa001 1
127 ANA201504110 riosa002 1
128 ANA201504110 peres002 2
129 ANA201504110 infao001 1
130 ANA201504110 escoa003 2
131 ANA201504110 mousm001 1
132 ANA201504110 joycm001 1
133 ANA201504110 freed001 1
134 ANA201504110 aybae001 1
135 ANA201504110 cainl001 1
136 ANA201504110 hosme001 1
137 ANA201504110 morak001 2
138 ANA201504110 gorda001 1
139 ANA201504110 riosa002 1
140 ANA201504110 navae001 1
141 ANA201504110 iannc001 1
142 ANA201504110 giavj001 1
143 ANA201504110 peres002 1
144 ANA201504110 infao001 1
145 ANA201504110 escoa003 1
146 ANA201504110 calhk001 1
147 ANA201504110 troum001 1
148 ANA201504110 pujoa001 1
149 ANA201504110 mousm001 2
150 ANA201504110 cainl001 1
151 ANA201504110 hosme001 1
152 ANA201504110 morak001 1
153 ANA201504110 gorda001 1
154 ANA201504110 joycm001 1
155 ANA201504110 freed001 1
156 ANA201504110 aybae001 1
我还制作了新数据库的子集,对应这两场比赛的40名先发球员。
GAME.ID PLAYER.ID
1 ANA201504100 escoa003
60465 ANA201504100 mousm001
120929 ANA201504100 cainl001
181393 ANA201504100 hosme001
241857 ANA201504100 morak001
302321 ANA201504100 gorda001
362785 ANA201504100 riosa002
423249 ANA201504100 peres002
483713 ANA201504100 infao001
1117610 ANA201504100 vargj001
573434 ANA201504100 calhk001
633898 ANA201504100 troum001
694362 ANA201504100 pujoa001
754826 ANA201504100 freed001
815290 ANA201504100 cronc002
875754 ANA201504100 aybae001
936218 ANA201504100 iannc001
996682 ANA201504100 cowgc001
1057146 ANA201504100 giavj001
1117613 ANA201504100 santh001
2 ANA201504110 escoa003
60466 ANA201504110 mousm001
120930 ANA201504110 cainl001
181394 ANA201504110 hosme001
241858 ANA201504110 morak001
302322 ANA201504110 gorda001
362786 ANA201504110 riosa002
423250 ANA201504110 peres002
483714 ANA201504110 infao001
2100000 ANA201504110 guthj001
573435 ANA201504110 calhk001
633899 ANA201504110 troum001
694363 ANA201504110 pujoa001
754827 ANA201504110 joycm001
815291 ANA201504110 freed001
875755 ANA201504110 aybae001
936219 ANA201504110 navae001
996683 ANA201504110 buted001
1057147 ANA201504110 giavj001
2100001 ANA201504110 weavj003
我认为应该有一种方法可以向后一个数据库添加一列,以便每个观察都引用其行中的 GAME.ID 和 PLAYER.ID 条目,搜索前一个数据库以隔离那些GAME.ID = GAME.ID 和 PLAYER.ID = BATTER 的观测值,计算 HIT.VALUE > 1(1 = 默认,2 = 单,3 = 双, 4 = 三重, 5 = 本垒打), 然后 returns 计入观察。在 excel 中,这可以通过 CountIf() 函数完成,我可以轻松复制向量的长度。不过,我不知道如何在 R 中做到这一点。
我想这可能就是您要找的。它按 GAME.ID
和 BATTER
对倒数第二个数据集进行分组,然后计算每组 >1 的命中数。
library(data.table)
dt<-setDT(df)[, list(count_hits = sum(HIT.VALUE>1)),by=c("GAME.ID","BATTER")]
head(dt)
GAME.ID BATTER count_hits
1: ANA201504100 escoa003 1
2: ANA201504100 mousm001 1
3: ANA201504100 cainl001 2
4: ANA201504100 hosme001 1
5: ANA201504100 morak001 2
6: ANA201504100 gorda001 0
base R 中的另一个选项是:
res<-aggregate(x=list(count_hits=df$HIT.VALUE), by=list(GAME.ID=df$GAME.ID,BATTER=df$BATTER), FUN = function(x) sum(x>1) )
head(res)
GAME.ID BATTER count_hits
1 ANA201504100 aybae001 0
2 ANA201504110 aybae001 1
3 ANA201504110 buted001 1
4 ANA201504100 cainl001 2
5 ANA201504110 cainl001 1
6 ANA201504100 calhk001 1