使用 python 下载 CSS 和 JS

Question

我想使用 python 下载网页内容（CSS 和 JS，可能还有 HTML）。

如何下载它们而不是在文本文件上打印它们的名称？

到目前为止，这是我的代码

import requests
from bs4 import BeautifulSoup as bs
from urllib.parse import urljoin

# URL of the web page you want to extract
url = "http://books.toscrape.com"

# initialize a session
session = requests.Session()
# set the User-agent as a regular browser
session.headers["User-Agent"] = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36"

# get the HTML content
html = session.get(url).content

# parse HTML using beautiful soup
soup = bs(html, "html.parser")

# get the JavaScript files
script_files = []

for script in soup.find_all("script"):
    if script.attrs.get("src"):
        # if the tag has the attribute 'src'
        script_url = urljoin(url, script.attrs.get("src"))
        script_files.append(script_url)

# get the CSS files
css_files = []

for css in soup.find_all("link"):
    if css.attrs.get("href"):
        # if the link tag has the 'href' attribute
        css_url = urljoin(url, css.attrs.get("href"))
        css_files.append(css_url)

print("Total script files in the page:", len(script_files))
print("Total CSS files in the page:", len(css_files))

# write file links into files
with open("javascript_files.txt", "w") as f:
    for js_file in script_files:
        print(js_file, file=f)

with open("css_files.txt", "w") as f:
    for css_file in css_files:
        print(css_file, file=f)

如何下载它们而不是在文本文件上打印它们的名称？

Answer 1

使用 requests 下载并使用 os.path.basename()

提取文件名

import os

for js_file in script_files:
    fileName = os.path.basename(js_file)
    text = requests.get(js_file).text
    with open(fileName, 'w', encoding="utf-8") as f:
        f.write(text)

for css_file in css_files:
    fileName = os.path.basename(css_file)
    text = requests.get(js_file).text
    with open(fileName, 'w', encoding="utf-8") as f:
        f.write(text)

使用 python 下载 CSS 和 JS

Download CSS and JS using python

python

beautifulsoup

web-scraping

python-requests