通过 Beautifulsoup 在 python 中提取对象的描述
Extract an object's description through Beautifulsoup in python
我想提取图附近的描述(从"Figurine model"到"Stay Tuned :)"的那个)并将其存储到变量information
到BeautifulSoup中。我该怎么做?
这是我的代码,但我不知道如何继续:
from bs4 import BeautifulSoup
response = requests.get('https://www.myminifactory.com/object/3d-print-the-little-prince-4707')
soup = BeautifulSoup(response.text, "lxml")
information =
我在页面下方向您展示了我要从中提取对象描述的位置。先感谢您!
这对我有用,因为我使用 break 语句的方式而不是脚本。但是脚本有效。
from urllib.request import urlopen
from bs4 import BeautifulSoup as BS
url = r'https://www.myminifactory.com/object/3d-print-the-little-prince-4707'
html = urlopen(url).read()
Soup = BS(html,"lxml")
Desc = Soup.find('div',{'class':'short-text text-auto-link'}).text
description = ''
for line in Desc.split('\n'):
if line.strip() == '_________________________________________________________________________':
break
if line.strip():
description += line.strip()
print(description)
找到父标签然后寻找<p>
,过滤空格和____
parent = soup.find("div",class_="row container-info-obj margin-t-10")
result = [" ".join(p.text.split()) for p in parent.find_all("p") if p.text.strip() and not "_"*8 in p.text]
#youtube_v = parent.find("iframe")["src"]
print(result)
我想提取图附近的描述(从"Figurine model"到"Stay Tuned :)"的那个)并将其存储到变量information
到BeautifulSoup中。我该怎么做?
这是我的代码,但我不知道如何继续:
from bs4 import BeautifulSoup
response = requests.get('https://www.myminifactory.com/object/3d-print-the-little-prince-4707')
soup = BeautifulSoup(response.text, "lxml")
information =
我在页面下方向您展示了我要从中提取对象描述的位置。先感谢您!
这对我有用,因为我使用 break 语句的方式而不是脚本。但是脚本有效。
from urllib.request import urlopen
from bs4 import BeautifulSoup as BS
url = r'https://www.myminifactory.com/object/3d-print-the-little-prince-4707'
html = urlopen(url).read()
Soup = BS(html,"lxml")
Desc = Soup.find('div',{'class':'short-text text-auto-link'}).text
description = ''
for line in Desc.split('\n'):
if line.strip() == '_________________________________________________________________________':
break
if line.strip():
description += line.strip()
print(description)
找到父标签然后寻找<p>
,过滤空格和____
parent = soup.find("div",class_="row container-info-obj margin-t-10")
result = [" ".join(p.text.split()) for p in parent.find_all("p") if p.text.strip() and not "_"*8 in p.text]
#youtube_v = parent.find("iframe")["src"]
print(result)