使用 Beautifulsoup 进行网络抓取数据时如何忽略 utf-8 编码

How toiIgnore utf-8 encoding when using Beautifulsoup for webscraping data

我正在使用 Beautifulsoup 进行网络抓取 prayerprofiler.com。但是,数据有 utf-8 编码,我无法处理。每当我打印数据时,我都会收到错误

UnicodeEncodeError: 'charmap' codec can't encode character '\u2605' in position 184621: character maps to <undefined>

我可以使用

解决这个问题
print(stats_page.encode("utf-8"))

但在那之后,如果我想使用命令抓取数据,我将无法使用数据

column_headers_row = stats_page.findAll('tr')

如何从网站获取数据,搜索 table 行并处理数据。

这是主要代码块:

import pandas as pd 
import numpy as np 
from bs4 import BeautifulSoup
import requests

r = requests.get("https://www.playerprofiler.com/nfl/george-kittle").text

stats_page = BeautifulSoup(r, 'lxml')

column_headers_row = stats_page.findAll('tr')

print(column_headers_row)

感谢您的帮助!

让pandas解析表格。它将 return 数据帧列表。只需按索引调出您想要的数据框,然后从那里开始:

import pandas as pd

url = 'https://www.playerprofiler.com/nfl/george-kittle'
df = pd.read_html(url)

出于某种原因,如果上面的代码不起作用,请尝试:

import pandas as pd
import requests

url = 'https://www.playerprofiler.com/nfl/george-kittle'
html = requests.get(url).text
df = pd.read_html(html)

输出:

print(df)
[   Year Year  ...  Fantasy Points Per Game FPts/G
0       2020  ...                       15.6 (#3)
1       2019  ...                       15.9 (#1)
2       2018  ...                         16 (#3)
3       2017  ...                       7.1 (#21)

[4 rows x 9 columns],   Snap Share Snap Share  ... Target Share Tgt Rate
0                 87.4%  ...        24.1% (9.8 rz)
1                    #4  ...                    #4

[2 rows x 7 columns],   Air Yards Air Yards  ... Target Rate Tgt Rate
0      460 (57.5 p/g)  ...                29.2%
1                 #22  ...                  #27

[2 rows x 7 columns],   Receptions Receptions  ... Fantasy Points Per Game Fantasy PTS/G
0            48 (6 p/g)  ...                                  15.6
1                   #15  ...                                    #3

[2 rows x 7 columns],   Yards Per Reception YPR  ... True Catch Rate True Catch Rate
0                    13.2  ...                           85.7%
1                      #6  ...                             #21

[2 rows x 7 columns],   Target Premium Tgt Prem  ... Contested Catch Rate Contested Catch %
0                   13.7%  ...                          80% (10 tgts)
1                      #8  ...                                     #1

[2 rows x 7 columns],   Production Premium Prod Premium  ... Fantasy Points Per Target Fantasy Pts/Tgt
0                            16.1  ...                                      1.99
1                              #3  ...                                        #9

[2 rows x 7 columns],   Snap Share Snap Share  ... Target Share Tgt Rate
0                   89%  ...       28.2% (26.2 rz)
1                    #5  ...                    #1

[2 rows x 7 columns],   Air Yards Air Yards  ... Target Rate Tgt Rate
0      623 (44.5 p/g)  ...                39.1%
1                 #12  ...                  #11

[2 rows x 7 columns],   Receptions Receptions  ... Fantasy Points Per Game Fantasy PTS/G
0          85 (6.1 p/g)  ...                                  15.9
1                    #4  ...                                    #1

[2 rows x 7 columns],   Yards Per Reception YPR  ... True Catch Rate True Catch Rate
0                    12.4  ...                           87.6%
1                      #9  ...                              #9

[2 rows x 7 columns],   Target Premium Tgt Prem  ... Contested Catch Rate Contested Catch %
0                    1.5%  ...                        53.8% (13 tgts)
1                     #18  ...                                     #6

[2 rows x 7 columns],   Production Premium Prod Premium  ... Fantasy Points Per Target Fantasy Pts/Tgt
0                            10.2  ...                                      2.08
1                              #6  ...                                        #8

[2 rows x 7 columns],   Snap Share Snap Share  ... Target Share Tgt Rate
0                 94.2%  ...         26.4% (26 rz)
1                    #3  ...                    #2

[2 rows x 7 columns],   Air Yards Air Yards  ... Target Rate Tgt Rate
0     1049 (65.6 p/g)  ...                34.2%
1                  #4  ...                  #21

[2 rows x 7 columns],   Receptions Receptions  ... Fantasy Points Per Game Fantasy PTS/G
0          88 (5.5 p/g)  ...                                    16
1                    #3  ...                                    #3

[2 rows x 7 columns],   Yards Per Reception YPR  ... True Catch Rate True Catch Rate
0                    15.6  ...                           82.2%
1                      #3  ...                             #25

[2 rows x 7 columns],   Target Premium Tgt Prem  ... Contested Catch Rate Contested Catch %
0                   21.8%  ...                        29.4% (17 tgts)
1                      #6  ...                                    #27

[2 rows x 7 columns],   Production Premium Prod Premium  ... Fantasy Points Per Target Fantasy Pts/Tgt
0                             6.3  ...                                       1.9
1                              #7  ...                                       #13

[2 rows x 7 columns],   Snap Share Snap Share  ... Target Share Tgt Rate
0                 60.6%  ...           11% (18 rz)
1                   #36  ...                   #27

[2 rows x 7 columns],   Air Yards Air Yards  ... Target Rate Tgt Rate
0      486 (32.4 p/g)  ...                  20%
1                 #23  ...                  #88

[2 rows x 7 columns],   Receptions Receptions  ... Fantasy Points Per Game Fantasy PTS/G
0          43 (2.9 p/g)  ...                                   7.1
1                   #18  ...                                   #21

[2 rows x 7 columns],   Yards Per Reception YPR  ... True Catch Rate True Catch Rate
0                      12  ...                           82.7%
1                     #13  ...                             #18

[2 rows x 7 columns],   Target Premium Tgt Prem  ... Contested Catch Rate Contested Catch %
0                    1.8%  ...                        45.5% (11 tgts)
1                     #16  ...                                    #20

[2 rows x 7 columns],   Production Premium Prod Premium  ... Fantasy Points Per Target Fantasy Pts/Tgt
0                            -3.6  ...                                      1.69
1                             #15  ...                                       #16

[2 rows x 7 columns],    Week Wk  ... Fantasy Points Fantasy Points
0        1  ...                     9.3 (#17)
1        4  ...                     40.1 (#1)
2        5  ...                     8.4 (#16)
3        6  ...                     23.9 (#2)
4        7  ...                    10.5 (#13)
5        8  ...                     5.9 (#21)
6       16  ...                    13.2 (#13)
7       17  ...                     13.8 (#6)

[8 rows x 9 columns],     Week Wk  ... Fantasy Points Fantasy Points
0         1  ...                    13.4 (##9)
1         2  ...                    8.4 (##12)
2         3  ...                   11.7 (##11)
3         5  ...                    20.8 (##1)
4         6  ...                    18.3 (##3)
5         7  ...                    6.8 (##18)
6         8  ...                    14.6 (##6)
7         9  ...                   19.9  (##3)
8        12  ...                    24.9 (##2)
9        13  ...                    3.4 (##33)
10       14  ...                    18.7 (##4)
11       15  ...                    26.4 (##1)
12       16  ...                    18.9 (##8)
13       17  ...                    16.3 (##5)

[14 rows x 9 columns],     Week Wk  ... Fantasy Points Fantasy Points
0         1  ...                    14.0 (##6)
1         2  ...                    4.2 (##34)
2         3  ...                    12.9 (##7)
3         4  ...                    24.5 (##2)
4         5  ...                    13.3 (##9)
5         6  ...                    7.0 (##21)
6         7  ...                    20.8 (##2)
7         8  ...                   10.7 (##14)
8         9  ...                    20.8 (##4)
9        10  ...                    17.3 (##4)
10       12  ...                   11.8 (##12)
11       13  ...                    13.0 (##7)
12       14  ...                    34.0 (##1)
13       15  ...                    8.1 (##12)
14       16  ...                    14.4 (##9)
15       17  ...                    29.9 (##2)

[16 rows x 9 columns],     Week Wk  ... Fantasy Points Fantasy Points
0         1  ...                    7.7 (##16)
1         2  ...                    3.3 (##36)
2         3  ...                    1.8 (##41)
3         4  ...                    5.5 (##29)
4         5  ...                    21.3 (##2)
5         6  ...                    8.6 (##18)
6         7  ...                    2.6 (##39)
7         8  ...                    4.2 (##24)
8         9  ...                    5.7 (##24)
9        12  ...                    2.4 (##38)
10       13  ...                    4.0 (##31)
11       14  ...                   3.0  (##30)
12       15  ...                    9.2 (##17)
13       16  ...                    13.2 (##7)
14       17  ...                    14.0 (##2)

[15 rows x 9 columns],   School School  ... Special Teams Yards Spc Tm Share
0   Iowa (2013)  ...                                0
1   Iowa (2014)  ...                                0
2   Iowa (2015)  ...                                0
3  Iowa  (2016)  ...                                0

[4 rows x 9 columns]]

尝试添加这行代码locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
r = requests.get("https://www.playerprofiler.com/nfl/george-kittle").text:
你必须 import locale