Python 中用于网络抓取的循环函数
Loop Function in Python for webscraping
你好,这是我在 python 中的第一个项目,我的目标是抓取 goodreads 中书籍的完整描述。该脚本的最终目标是输入您想要的图书 ID,并在文件中取回列中的 book_id 和此 book_id 的描述。现在我可以在列表中输入我想要的项目的编号并获得描述。
my_urls = 'https://www.goodreads.com/book/show/' + book_id[0]
如何循环此过程并获取每本书的描述?这是我的代码,提前致谢。
import bs4 as bs
import urllib.request
import csv
import requests
import re
from urllib.request import urlopen
from urllib.error import HTTPError
book_id = ['17227298','18386','1852','17245','60533063'] # Here I enter my book idυ
my_urls = 'https://www.goodreads.com/book/show/' + book_id[0] #I concatenate book_id with the url
source = urlopen(my_urls).read()
soup = bs.BeautifulSoup(source, 'lxml')
short_description = soup.find('div', class_='readable stacked').span # finds the description div
full_description = short_description.find_next_siblings('span') # Goes to the sibling span that has the full description
def get_description(soup):
full_description = short_description.find_next_siblings('span')
return full_description
定义一个方法来执行 one item
的操作
def get_description(book_id):
my_urls = 'https://www.goodreads.com/book/show/' + book_id
source = urlopen(my_urls).read()
soup = bs.BeautifulSoup(source, 'lxml')
short_description = soup.find('div', class_='readable stacked').span
full_description = short_description.find_next_siblings('span')
return full_description
然后在列表的每一项上调用它
book_ids = ['17227298', '18386', '1852', '17245', '60533063']
for book_id in book_ids:
print(get_description(book_id))
你好,这是我在 python 中的第一个项目,我的目标是抓取 goodreads 中书籍的完整描述。该脚本的最终目标是输入您想要的图书 ID,并在文件中取回列中的 book_id 和此 book_id 的描述。现在我可以在列表中输入我想要的项目的编号并获得描述。
my_urls = 'https://www.goodreads.com/book/show/' + book_id[0]
如何循环此过程并获取每本书的描述?这是我的代码,提前致谢。
import bs4 as bs
import urllib.request
import csv
import requests
import re
from urllib.request import urlopen
from urllib.error import HTTPError
book_id = ['17227298','18386','1852','17245','60533063'] # Here I enter my book idυ
my_urls = 'https://www.goodreads.com/book/show/' + book_id[0] #I concatenate book_id with the url
source = urlopen(my_urls).read()
soup = bs.BeautifulSoup(source, 'lxml')
short_description = soup.find('div', class_='readable stacked').span # finds the description div
full_description = short_description.find_next_siblings('span') # Goes to the sibling span that has the full description
def get_description(soup):
full_description = short_description.find_next_siblings('span')
return full_description
定义一个方法来执行 one item
的操作def get_description(book_id):
my_urls = 'https://www.goodreads.com/book/show/' + book_id
source = urlopen(my_urls).read()
soup = bs.BeautifulSoup(source, 'lxml')
short_description = soup.find('div', class_='readable stacked').span
full_description = short_description.find_next_siblings('span')
return full_description
然后在列表的每一项上调用它
book_ids = ['17227298', '18386', '1852', '17245', '60533063']
for book_id in book_ids:
print(get_description(book_id))