Python 中用于网络抓取的循环函数

Question

你好，这是我在 python 中的第一个项目，我的目标是抓取 goodreads 中书籍的完整描述。该脚本的最终目标是输入您想要的图书 ID，并在文件中取回列中的 book_id 和此 book_id 的描述。现在我可以在列表中输入我想要的项目的编号并获得描述。 my_urls = 'https://www.goodreads.com/book/show/' + book_id[0] 如何循环此过程并获取每本书的描述？这是我的代码，提前致谢。

import bs4 as bs
import urllib.request
import csv
import requests
import re
from urllib.request import urlopen
from urllib.error import HTTPError

book_id = ['17227298','18386','1852','17245','60533063']  # Here I enter my book idυ
my_urls = 'https://www.goodreads.com/book/show/' + book_id[0] #I concatenate book_id with the url
source = urlopen(my_urls).read()
soup = bs.BeautifulSoup(source, 'lxml')
short_description = soup.find('div', class_='readable stacked').span  # finds the description div
full_description = short_description.find_next_siblings('span')  # Goes to the sibling span that has the full description

def get_description(soup):  
    full_description = short_description.find_next_siblings('span')
    return full_description

Answer 1

定义一个方法来执行 one item

的操作

def get_description(book_id):
    my_urls = 'https://www.goodreads.com/book/show/' + book_id
    source = urlopen(my_urls).read()
    soup = bs.BeautifulSoup(source, 'lxml')
    short_description = soup.find('div', class_='readable stacked').span
    full_description = short_description.find_next_siblings('span')
    return full_description

然后在列表的每一项上调用它

book_ids = ['17227298', '18386', '1852', '17245', '60533063']
for book_id in book_ids:
    print(get_description(book_id))

Python 中用于网络抓取的循环函数

Loop Function in Python for webscraping

python

loops

web-scraping