从 data-op-info 中提取键值对的漂亮汤
Beautiful soup to extract key value pairs from data-op-info
下面的代码没有失败,但并不完整。从这一点开始,我试图只将所有完整游戏值放入数据框中。
import json
from bs4 import BeautifulSoup
import urllib.request
source = urllib.request.urlopen('https://www.oddsshark.com/nfl/odds').read()
soup = BeautifulSoup(source, 'html.parser')
results = soup.find_all(class_ = "op-item op-spread op-opening")
for result in (results):
print(json.loads(result['data-op-info']).items())
我在最后使用了打印,因为我试图只提取行值并查看它。
请注意,此站点上有一个类似的问题,但该解决方案仅适用于一个 div。如果变量有多个 divs.
,它将失败
How to parse information between {} on web page using Beautifulsoup
你快到了。看看我在哪里有列表理解来捕获结果然后使用 json_normalize()
import json
from bs4 import BeautifulSoup
import urllib.request
source = urllib.request.urlopen('https://www.oddsshark.com/nfl/odds').read()
soup = BeautifulSoup(source, 'html.parser')
results = soup.find_all(class_ = "op-item op-spread op-opening")
rlist = [json.loads(result['data-op-info']) for result in (results)]
pd.json_normalize(rlist)
fullgame firsthalf secondhalf firstquarter secondquarter thirdquarter fourthquarter
0 -4.5 -2.5 -1.5 -0.5 -0.5 -0.5 -0.5
1 +4.5 +2.5 +1.5 +0.5 +0.5 +0.5 +0.5
2 +7 +4 +3.5 +3 +3 +2.5 +2
3 -7 -4 -3.5 -3 -3 -2.5 -2
4 -3 -3 -2.5 -0.5 -2 -0.5 -0.5
5 +3 +3 +2.5 +0.5 +2 +0.5 +0.5
6 +3 +2.5 +0.5 +0.5 +0.5 +0.5 +0.5
7 -3 -2.5 -0.5 -0.5 -0.5 -0.5 -0.5
8 -3 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5
9 +3 +0.5 +0.5 +0.5 +0.5 +0.5 +0.5
10 -3 -2.5 -1 -0.5 -1 -0.5 -0.5
11 +3 +2.5 +1 +0.5 +1 +0.5 +0.5
12 -1 +0.5 -0.5 +0.5 -0.5 -0.5 -0.5
13 +1 -0.5 +0.5 -0.5 +0.5 +0.5 +0.5
14 +2.5 +3.5 +3 +0.5 +2.5 +0.5 +1
15 -2.5 -3.5 -3 -0.5 -2.5 -0.5 -1
16 +4 +3 +2 +0.5 +1 +0.5 +0.5
17 -4 -3 -2 -0.5 -1 -0.5 -0.5
18 -2.5 -0.5 -0.5 +0.5 -0.5 -0.5 -0.5
19 +2.5 +0.5 +0.5 -0.5 +0.5 +0.5 +0.5
20 -2.5 -1.5 -0.5 -0.5 -0.5 -0.5 -0.5
21 +2.5 +1.5 +0.5 +0.5 +0.5 +0.5 +0.5
22 +2.5 +1.5 +0.5 +0.5 +0.5 +0.5 +0.5
23 -2.5 -1.5 -0.5 -0.5 -0.5 -0.5 -0.5
24 +1.5 +1.5 Ev +0.5 -0.5 -0.5 -0.5
25 -1.5 -1.5 Ev -0.5 +0.5 +0.5 +0.5
26 +5.5 +3 +2.5 +0.5 +0.5 +0.5 +0.5
27 -5.5 -3 -2.5 -0.5 -0.5 -0.5 -0.5
28 -3.5 -0.5 Ev -0.5 +0.5 +0.5 +0.5
29 +3.5 +0.5 Ev +0.5 -0.5 -0.5 -0.5
30 -5
31 +5
或者,如果您真的只想要字典中的一个键:
rlist = [json.loads(result['data-op-info'])['fullgame'] for result in (results)]
pd.DataFrame({'fullgame': rlist})
fullgame
0 -4.5
1 +4.5
2 +7
3 -7
4 -3
5 +3
6 +3
7 -3
8 -3
9 +3
10 -3
11 +3
12 -1
13 +1
14 +2.5
15 -2.5
16 +4
17 -4
18 -2.5
19 +2.5
20 -2.5
21 +2.5
22 +2.5
23 -2.5
24 +1.5
25 -1.5
26 +5.5
27 -5.5
28 -3.5
29 +3.5
30 -5
31 +5
下面的代码没有失败,但并不完整。从这一点开始,我试图只将所有完整游戏值放入数据框中。
import json
from bs4 import BeautifulSoup
import urllib.request
source = urllib.request.urlopen('https://www.oddsshark.com/nfl/odds').read()
soup = BeautifulSoup(source, 'html.parser')
results = soup.find_all(class_ = "op-item op-spread op-opening")
for result in (results):
print(json.loads(result['data-op-info']).items())
我在最后使用了打印,因为我试图只提取行值并查看它。
请注意,此站点上有一个类似的问题,但该解决方案仅适用于一个 div。如果变量有多个 divs.
,它将失败
How to parse information between {} on web page using Beautifulsoup
你快到了。看看我在哪里有列表理解来捕获结果然后使用 json_normalize()
import json
from bs4 import BeautifulSoup
import urllib.request
source = urllib.request.urlopen('https://www.oddsshark.com/nfl/odds').read()
soup = BeautifulSoup(source, 'html.parser')
results = soup.find_all(class_ = "op-item op-spread op-opening")
rlist = [json.loads(result['data-op-info']) for result in (results)]
pd.json_normalize(rlist)
fullgame firsthalf secondhalf firstquarter secondquarter thirdquarter fourthquarter
0 -4.5 -2.5 -1.5 -0.5 -0.5 -0.5 -0.5
1 +4.5 +2.5 +1.5 +0.5 +0.5 +0.5 +0.5
2 +7 +4 +3.5 +3 +3 +2.5 +2
3 -7 -4 -3.5 -3 -3 -2.5 -2
4 -3 -3 -2.5 -0.5 -2 -0.5 -0.5
5 +3 +3 +2.5 +0.5 +2 +0.5 +0.5
6 +3 +2.5 +0.5 +0.5 +0.5 +0.5 +0.5
7 -3 -2.5 -0.5 -0.5 -0.5 -0.5 -0.5
8 -3 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5
9 +3 +0.5 +0.5 +0.5 +0.5 +0.5 +0.5
10 -3 -2.5 -1 -0.5 -1 -0.5 -0.5
11 +3 +2.5 +1 +0.5 +1 +0.5 +0.5
12 -1 +0.5 -0.5 +0.5 -0.5 -0.5 -0.5
13 +1 -0.5 +0.5 -0.5 +0.5 +0.5 +0.5
14 +2.5 +3.5 +3 +0.5 +2.5 +0.5 +1
15 -2.5 -3.5 -3 -0.5 -2.5 -0.5 -1
16 +4 +3 +2 +0.5 +1 +0.5 +0.5
17 -4 -3 -2 -0.5 -1 -0.5 -0.5
18 -2.5 -0.5 -0.5 +0.5 -0.5 -0.5 -0.5
19 +2.5 +0.5 +0.5 -0.5 +0.5 +0.5 +0.5
20 -2.5 -1.5 -0.5 -0.5 -0.5 -0.5 -0.5
21 +2.5 +1.5 +0.5 +0.5 +0.5 +0.5 +0.5
22 +2.5 +1.5 +0.5 +0.5 +0.5 +0.5 +0.5
23 -2.5 -1.5 -0.5 -0.5 -0.5 -0.5 -0.5
24 +1.5 +1.5 Ev +0.5 -0.5 -0.5 -0.5
25 -1.5 -1.5 Ev -0.5 +0.5 +0.5 +0.5
26 +5.5 +3 +2.5 +0.5 +0.5 +0.5 +0.5
27 -5.5 -3 -2.5 -0.5 -0.5 -0.5 -0.5
28 -3.5 -0.5 Ev -0.5 +0.5 +0.5 +0.5
29 +3.5 +0.5 Ev +0.5 -0.5 -0.5 -0.5
30 -5
31 +5
或者,如果您真的只想要字典中的一个键:
rlist = [json.loads(result['data-op-info'])['fullgame'] for result in (results)]
pd.DataFrame({'fullgame': rlist})
fullgame
0 -4.5
1 +4.5
2 +7
3 -7
4 -3
5 +3
6 +3
7 -3
8 -3
9 +3
10 -3
11 +3
12 -1
13 +1
14 +2.5
15 -2.5
16 +4
17 -4
18 -2.5
19 +2.5
20 -2.5
21 +2.5
22 +2.5
23 -2.5
24 +1.5
25 -1.5
26 +5.5
27 -5.5
28 -3.5
29 +3.5
30 -5
31 +5