Python selenium 使用 execute_cdp_cmd 访问 chrome 开发工具 |判断哪个stylesheetId属于哪个样式表

Python selenium accessing chrome dev tools with execute_cdp_cmd | Determine which stylesheetId belongs to which stylesheet

我将 selenium 与 python 结合使用,以自动从我的网站中删除未使用的 CSS 代码。

我在这里找到了一个似乎不错的解决方案:

https://chromedevtools.github.io/devtools-protocol/tot/CSS/

我现在尝试了不同的方法来使用我使用的代码生成范围 json,这似乎很有希望:

browser.execute_cdp_cmd("CSS.enable", {})
browser.execute_cdp_cmd("CSS.startRuleUsageTracking", {})

sleep(1)
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
sleep(1)

#snapshot = browser.execute_cdp_cmd("Profiler.takePreciseCoverage", {})
#snapshot = browser.execute_cdp_cmd("Profiler.stop", {})
#browser.execute_cdp_cmd("CSS.stopRuleUsageTracking", {})
snapshot = browser.execute_cdp_cmd("CSS.takeCoverageDelta", {}) # seems to work best as it is dynamic
print(snapshot)

with open('coverage_delta_css.json', 'w', encoding='utf-8') as _file:
    json.dump(snapshot, _file)

print(snapshot['coverage'][0]['styleSheetId']) # a test id
print(browser.execute_cdp_cmd("CSS.getStyleSheetText ", {"styleSheetId": snapshot['coverage'][0]['styleSheetId']})) # here it tells me the Id is unknown

这是生成的 json:

的示例部分
{
  "coverage": [
    {
      "endOffset": 335,
      "startOffset": 133,
      "styleSheetId": "16752.5",
      "used": true
    },
    {
      "endOffset": 1025,
      "startOffset": 471,
      "styleSheetId": "16752.7",
      "used": true
    },
    ...
  ]
}

问题是 styleSheetId 是一个数字,我找不到确定它引用哪个样式表的方法。我有(main.cssother.css)。我只想删除其他 CSS.

中未使用的 CSS

同样在上面的示例中,我尝试使用 JSON 中的 id 获取样式表的原始文本,但似乎 id 随每次调用而变化并且是未知的。

selenium.common.exceptions.WebDriverException: Message: unknown error: unhandled inspector error: {"code":-32601,"message":"'CSS.getStyleSheetText ' wasn't found"}

我觉得我已经接近解决方案了。希望有人能帮助完成最后的步骤。

您在 "CSS.getStyleSheetText " 中有一个额外的 space。结果将是一个键为 "text":

的字典
from selenium import webdriver


options = webdriver.ChromeOptions()
options.add_argument("headless")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
try:
    driver.get('https://www.google.com')

    driver.execute_cdp_cmd("CSS.enable", {})
    driver.execute_cdp_cmd("CSS.startRuleUsageTracking", {})
    snapshot = driver.execute_cdp_cmd("CSS.takeCoverageDelta", {})
    coverage = snapshot['coverage']
    n_sheets = len(coverage)
    print(n_sheets)
    # print last one:
    id = coverage[-1]['styleSheetId']
    print(driver.execute_cdp_cmd("CSS.getStyleSheetText", {"styleSheetId": id})['text'])
finally:
    driver.quit()

打印:

108
.UUbT9{position:absolute;width:100%;text-align:left;margin-top:-1px;z-index:3;cursor:default;-webkit-user-select:none}.aajZCb{background:#fff;box-shadow:0 4px 6px rgba(32,33,36,.28);display:flex;flex-direction:column;list-style-type:none;margin:0;padding:0;border:0;border-radius:0 0 24px 24px;padding-bottom:4px;overflow:hidden}.minidiv .aajZCb{border-bottom-left-radius:16px;border-bottom-right-radius:16px}.erkvQe{flex:auto;padding-bottom:8px}.RjPuVb{height:1px;margin:0 26px 0 0}.S3nFnd{display:flex}.S3nFnd .RjPuVb,.S3nFnd .aajZCb{flex:0 0 auto}.lh87ke:link,.lh87ke:visited{color:#36c;cursor:pointer;font:11px arial,sans-serif;padding:0 5px;margin-top:-10px;text-decoration:none;flex:auto;align-self:flex-end;margin:0 16px 5px 0}.lh87ke:hover{text-decoration:underline}.xtSCL{border-top:1px solid #e8eaed;margin:0 20px 0 14px;padding-bottom:4px}.sb7{background:url() no-repeat ;min-height:0px;min-width:0px;height:0px;width:0px}.sb27{background:url(/images/searchbox/desktop_searchbox_sprites318_hr.webp) no-repeat 0 -21px;background-size:20px;min-height:20px;min-width:20px;height:20px;width:20px}.sb43{background:url(/images/searchbox/desktop_searchbox_sprites318_hr.webp) no-repeat 0 0;background-size:20px;min-height:20px;min-width:20px;height:20px;width:20px}.sb53.sb53{padding:0 4px;margin:0}.sb33{background:url(/images/searchbox/desktop_searchbox_sprites318_hr.webp) no-repeat 0 -42px;background-size:20px;height:20px;width:20px}

但这只是给你文字。目前尚不清楚如何将其追溯到 CSS 文件。我还 运行 针对一个网站返回了一个快照,其中包含 9 个 styleSheetId 值的列表,这些值都是相同的(HTML 指定了一个 CSS 样式表)。

为什么不直接解析 HTML 来源以查找外部样式表链接,如下所示:

from selenium import webdriver
from bs4 import BeautifulSoup


options = webdriver.ChromeOptions()
options.add_argument("headless")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
try:
    driver.get('https://www.yahoo.com')
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    css_files =  [link["href"] for link in soup.findAll("link") if "stylesheet" in link.get("rel", [])]
    print(css_files)
finally:
    driver.quit()

打印:

['https://s.yimg.com/nn/lib/metro/g/myy/grid_0.0.39.css', 'https://s.yimg.com/nn/lib/metro/g/myy/video_styles_0.0.72.css', 'https://s.yimg.com/nn/lib/metro/g/myy/font_yahoosans_0.0.45.css', 'https://s.yimg.com/nn/lib/metro/g/myy/wafertooltip_0.0.15.css', 'https://s.yimg.com/nn/lib/metro/g/sda/sda_flex_0.0.43.css', 'https://s.yimg.com/nn/lib/metro/g/sda/sda_adlite_0.0.7.css', 'https://s.yimg.com/os/yc/css/bundle.c60a6d54.css', 'https://s.yimg.com/aaq/fp/css/tdv2-applet-native-ads.PencilAd.atomic.ltr.4486c5cd56279289e1537fa63007fc45.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-featurebar.FeaturebarNew.atomic.ltr.43aa16e888a4e6e22b1273bcd144ec13.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-ntk.NTKGrid.atomic.ltr.7ae95e008cea5ca8c068d5e54332ac45.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-ntk.custom_grid.desktop.0e2848ba5290686273ddd6bdd2b6de63.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-stream.StreamGrid.atomic.ltr.2bdffca67e538fcc3d9e3d2b82e9fafa.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-stream.custom.desktop.35b4e59342f8c72801c502afb5933cff.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-stream.custom_grid.desktop.4ac7e62f7d11f0c628c4aa3fae7a8123.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-user-intent.rollupDesktop.atomic.ltr.e7f97823ea12a8bcef9fee986f8e851c.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-hpsetpromo.HpSetPromo.atomic.ltr.ceb4bec833ee8522db3f8a70f17355fd.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-trending.Trending.atomic.ltr.720d5fde89dba7a904a549124d90eaf9.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-weather.WeatherPreview.atomic.ltr.39ec8a7197b2e854eee5eb76559ec7a7.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-weather.common.desktop.62d099be776ca538092fa6ba87d1637b.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-scores.Scores.atomic.ltr.8c0e78d3aa079ff5130e0c619459ceb7.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-horoscope.HoroscopeGrid.atomic.ltr.3c743dd98289534ad1c07c777eb26bfb.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-subscription.SubscriptionGemini.atomic.ltr.7195e577ca1efda06c4b6857ded4b121.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-footer.FooterDesktop.atomic.ltr.47a5bd70d90a008f7b6a867d2fee9ab2.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-hpsetpromo.HpSetBannerPromo.atomic.ltr.030d2a4c4521d5f72e1051e79290b8ea.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-header.HeaderYBar.atomic.ltr.a67b5276a2eb6b9bff5bb0c370dd5c32.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-header.ybar.desktop.a5ef55315256ad2c3ff918a06f48f42e.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-stream.StreamRelated.atomic.ltr.9cc9afaf9464d66e96bdf361af28f069.min.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-user-dialog.UserDialogLite.atomic.ltr.26606a64b43c7b47d521ea69b3ba11d5.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-subscription.SubscriptionReminder.atomic.ltr.46374553adf3056a1dac33e7fd69d273.min.css', 'https://s.yimg.com/aaq/fp/css/react-wafer-subscription.custom.desktop.58c3fd7871df14d8f7f937fe038bcf17.css', 'https://s.yimg.com/aaq/fp/css/tdv2-wafer-user-intent.ContentPreference.atomic.ltr.eff5a3fd68eba42b5cbab57992febcaa.min.css', 'https://s.yimg.com/aaq/scp/css/viewer.bbd65011fa714bc6a4c74ebbfb906d06.css', 'https://s.yimg.com/aaq/c/e43d43c.caas-hpgrid.min.css', 'https://assets.video.yahoo.net/builds/a064591d7b/vdms-video-player.css']

或者如果您不使用 AJAX 动态修改 DOM 以在页面初始加载后添加其他样式表,则只需使用 requests:

import requests
from bs4 import BeautifulSoup


r = requests.get('https://www.yahoo.com')
soup = BeautifulSoup(r.text, 'html.parser')
css_files =  [link["href"] for link in soup.findAll("link") if "stylesheet" in link.get("rel", [])]
print(css_files)

如果您只对相对 URL 感兴趣,则可以处理返回的 URL 列表。