具有 python、javascript 输出的网页抓取
Web scraping with python, javascript output
我想从这个网站上删除工作信息,已经卡了几天了。当我打印 soup.text 输出时,我得到一个简短的 javascript 文本,这不是我想要的,因为我想要 html 元素。我已经看到类似的解决方案来实现 'Header less browsing',但是当我实现它时,我只收到了几个错误。我是网络抓取的新手,看过各种教程、视频,但根本没有得到我想要的输出,也不知道我做错了什么。
import requests
from bs4 import BeautifulSoup
def aSwiftScraper():
jobLinks = []
pages = []
URL = "https://www.amiqus.com/jobs?options=,20993,20877,20876&page=1"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
print(soup.text)
aSwiftScraper()
向服务器发出请求时尝试更改 User-Agent
HTTP header:
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
}
url = "https://www.amiqus.com/jobs?options=,20993,20877,20876&page=1"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
for title in soup.select(".attrax-vacancy-tile__title"):
print(title.get_text(strip=True))
打印:
Engine Programmer C++ AAA opportunity - Remote working
Senior Programmer
Gameplay Programmer
...
我想从这个网站上删除工作信息,已经卡了几天了。当我打印 soup.text 输出时,我得到一个简短的 javascript 文本,这不是我想要的,因为我想要 html 元素。我已经看到类似的解决方案来实现 'Header less browsing',但是当我实现它时,我只收到了几个错误。我是网络抓取的新手,看过各种教程、视频,但根本没有得到我想要的输出,也不知道我做错了什么。
import requests
from bs4 import BeautifulSoup
def aSwiftScraper():
jobLinks = []
pages = []
URL = "https://www.amiqus.com/jobs?options=,20993,20877,20876&page=1"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
print(soup.text)
aSwiftScraper()
向服务器发出请求时尝试更改 User-Agent
HTTP header:
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
}
url = "https://www.amiqus.com/jobs?options=,20993,20877,20876&page=1"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
for title in soup.select(".attrax-vacancy-tile__title"):
print(title.get_text(strip=True))
打印:
Engine Programmer C++ AAA opportunity - Remote working
Senior Programmer
Gameplay Programmer
...