美汤无法抓取网站内容

beautiful soup unable to scrape website contents

你好,我想在这个网站上做一个简单的网络抓取 https://www.sayurbox.com/p/Swallow%20Tepung%20Agar%20Agar%20Tinggi%20Serat%207%20gram

我的代码是这样的:

def userAgent(URL):
    ua = UserAgent()
    USER_AGENT = ua.random
    headers = {"User-Agent" : str(USER_AGENT),"Accept-Encoding": "*","Connection": "keep-alive"}
    resp = requests.get(URL, headers=headers)
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        print(f'{URL}')
    else:
        print(f'error 200:{URL}')
        urlError = pd.DataFrame({'url':[URL],
                                'date':[dateNow] 
                                })
        urlError.to_csv('errorUrl/errorUrl.csv', mode='a', index=False, header=False)
    return soup

soup = userAgent(url)
productTitle = soup.find_all('div', {"class":"InfoProductDetail__shortDesc"})

但是无法这样做,是我的代码有问题吗?我尝试添加 time.sleep 以等待页面加载,但它仍然不起作用。帮助将不胜感激

您的代码很好,但是 url 是动态的,这意味着数据是由 JavaScript 和请求生成的,BeautifulSoup 无法模仿,您需要类似 [=18 的自动化工具=] 你可以 运行 代码。

from bs4 import BeautifulSoup
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager


url = 'https://www.sayurbox.com/p/Swallow%20Tepung%20Agar%20Agar%20Tinggi%20Serat%207%20gram'

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
time.sleep(5)

soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.close()



title=soup.select_one('.InfoProductDetail__shortDesc').text
price= soup.select_one('span.InfoProductDetail__price').text

print(title)
print(price)

输出:

Swallow Tepung Agar Agar Tinggi Serat 7 gram
7.900