循环条件以确保 python 仅抓取单个 div
Conditions in loop to ensure python only scrapes single div
在尝试抓取该网站时:https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/ 我通过执行以下操作找到了食品名称:
import requests
from bs4 import BeautifulSoup
url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, 'html.parser')
foodLocation = soup.find_all('div', class_='item-name')
for singleFood in foodLocation:
food = singleFood.text
print(food)
问题是,我只想打印在 link 的午餐部分看到的“World Palate Maize”部分中的食物。在 HTML 中,有多个 div 都包含某种类型的食物(World Palate Maize、Hot Cereal、MBakery 等)我无法弄清楚如何告诉循环只打印特定部分的内部(特定 div?)。这可能需要 for 循环中的 if 语句或条件,但我不确定如何 format/what 用作条件以确保此循环仅打印一个部分的内容。
似乎“午餐”总是排在第二位div,所以你或许可以
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla'
}
url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, 'html.parser')
[breakfast, lunch, dinner] = soup.select('div#mdining-items div.courses')
foods = lunch.select('div.item-name')
for food in foods:
print(food.text)
一种策略可以是 select 更具体的文字,例如css selectors
:
soup.select('h3:-soup-contains("Lunch")+div h4:-soup-contains("World Palate Maize") + ul .item-name')
例子
import requests
from bs4 import BeautifulSoup
url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
foodLocation = soup.select('h3:-soup-contains("Lunch")+div h4:-soup-contains("World Palate Maize") + ul .item-name')
for singleFood in foodLocation:
food = singleFood.text
print(food)
输出
Mojo Grilled Chicken
Italian White Bean Salad
在尝试抓取该网站时:https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/ 我通过执行以下操作找到了食品名称:
import requests
from bs4 import BeautifulSoup
url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, 'html.parser')
foodLocation = soup.find_all('div', class_='item-name')
for singleFood in foodLocation:
food = singleFood.text
print(food)
问题是,我只想打印在 link 的午餐部分看到的“World Palate Maize”部分中的食物。在 HTML 中,有多个 div 都包含某种类型的食物(World Palate Maize、Hot Cereal、MBakery 等)我无法弄清楚如何告诉循环只打印特定部分的内部(特定 div?)。这可能需要 for 循环中的 if 语句或条件,但我不确定如何 format/what 用作条件以确保此循环仅打印一个部分的内容。
似乎“午餐”总是排在第二位div,所以你或许可以
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla'
}
url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
req = requests.get(url, headers)
soup = BeautifulSoup(req.content, 'html.parser')
[breakfast, lunch, dinner] = soup.select('div#mdining-items div.courses')
foods = lunch.select('div.item-name')
for food in foods:
print(food.text)
一种策略可以是 select 更具体的文字,例如css selectors
:
soup.select('h3:-soup-contains("Lunch")+div h4:-soup-contains("World Palate Maize") + ul .item-name')
例子
import requests
from bs4 import BeautifulSoup
url = "https://dining.umich.edu/menus-locations/dining-halls/mosher-jordan/"
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
foodLocation = soup.select('h3:-soup-contains("Lunch")+div h4:-soup-contains("World Palate Maize") + ul .item-name')
for singleFood in foodLocation:
food = singleFood.text
print(food)
输出
Mojo Grilled Chicken
Italian White Bean Salad