美汤find_allreturnNone
Beautiful Soup find_all return None
我写了下面的代码来草写每一项的论文名称。
import requests
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36',
'Connection': 'close'
}
url = ‘https://www.sciencedirect.com/journal/journal-of-econometrics/vol/225/issue/1'
response = requests.get(url = url, headers = headers)
content = response.text
soup = BeautifulSoup(content,'lxml’)
#Method 1
books_contents = soup.find('ol',class_='js-article-list article-list-items')
hotel_texts = soup.find_all('li',class_='js-article-list-item article-item u-padding-xs-top u-margin-l-bottom')
#Method2
hotel_texts = soup.find_all('li',class_='js-article-list-item article-item u-padding-xs-top u-margin-l-bottom’)
我尝试了两种方法,方法一尝试先找到大框架,然后再找到每个元素。或者直接找到每个元素。我可以观察网络源代码中的每个元素,但它 returns 我 [].
网站受cloudflare保护。所以我使用 cloudscraper 而不是 requests.Here 是一个可行的解决方案示例。
from bs4 import BeautifulSoup
import cloudscraper
scraper = cloudscraper.create_scraper(delay=10, browser={'custom': 'ScraperBot/1.0',})
url = 'https://www.sciencedirect.com/journal/journal-of-econometrics/vol/225/issue/1'
response = scraper.get(url)
#print(response)
content = response.text
soup = BeautifulSoup(content,'lxml')
hotel_texts = soup.find_all('dl',class_='js-article article-content')
for txt in hotel_texts:
h3 = txt.select_one('.anchor-text').get_text()
print(h3)
输出:
Editorial Board
Editorial for Special Issue: Vector Autoregressions
Detecting groups in large vector autoregressions
Identification of structural vector autoregressions through higher unconditional moments
Using time-varying volatility for identification in Vector Autoregressions: An application to endogenous uncertainty
Inference in Structural Vector Autoregressions identified with an external instrument
Inference in Bayesian Proxy-SVARs
Impulse response analysis for structural dynamic models with nonlinear regressors
我写了下面的代码来草写每一项的论文名称。
import requests
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36',
'Connection': 'close'
}
url = ‘https://www.sciencedirect.com/journal/journal-of-econometrics/vol/225/issue/1'
response = requests.get(url = url, headers = headers)
content = response.text
soup = BeautifulSoup(content,'lxml’)
#Method 1
books_contents = soup.find('ol',class_='js-article-list article-list-items')
hotel_texts = soup.find_all('li',class_='js-article-list-item article-item u-padding-xs-top u-margin-l-bottom')
#Method2
hotel_texts = soup.find_all('li',class_='js-article-list-item article-item u-padding-xs-top u-margin-l-bottom’)
我尝试了两种方法,方法一尝试先找到大框架,然后再找到每个元素。或者直接找到每个元素。我可以观察网络源代码中的每个元素,但它 returns 我 [].
网站受cloudflare保护。所以我使用 cloudscraper 而不是 requests.Here 是一个可行的解决方案示例。
from bs4 import BeautifulSoup
import cloudscraper
scraper = cloudscraper.create_scraper(delay=10, browser={'custom': 'ScraperBot/1.0',})
url = 'https://www.sciencedirect.com/journal/journal-of-econometrics/vol/225/issue/1'
response = scraper.get(url)
#print(response)
content = response.text
soup = BeautifulSoup(content,'lxml')
hotel_texts = soup.find_all('dl',class_='js-article article-content')
for txt in hotel_texts:
h3 = txt.select_one('.anchor-text').get_text()
print(h3)
输出:
Editorial Board
Editorial for Special Issue: Vector Autoregressions
Detecting groups in large vector autoregressions
Identification of structural vector autoregressions through higher unconditional moments
Using time-varying volatility for identification in Vector Autoregressions: An application to endogenous uncertainty
Inference in Structural Vector Autoregressions identified with an external instrument
Inference in Bayesian Proxy-SVARs
Impulse response analysis for structural dynamic models with nonlinear regressors