Python

Question

我有以下 Python 代码，其中 items 是从两个网站 requests/responses 生成的连接 XML 数据的字符串：

items = ET.fromstring(new)
for item in list(items):
    url = item.find("url")
    endpoint = url.text
    ##
    resp = item.find("response")
    response = resp.text
    responses = response.split("\n")
    index = responses.index('')
    indexed = responses[:index]
    print(endpoint, *indexed, sep = "\n")

打印：

https://www.youtube.com/sw.js_data
HTTP/2 200 OK
Content-Type: application/json; charset=utf-8
X-Content-Type-Options: nosniff
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Mon, 01 Jan 1990 00:00:00 GMT
Date: Mon, 14 Mar 2022 17:59:34 GMT
Content-Disposition: attachment; filename="response.bin"; filename*=UTF-8''response.bin
Strict-Transport-Security: max-age=31536000
X-Frame-Options: SAMEORIGIN
Cross-Origin-Opener-Policy-Report-Only: same-origin; report-to="ATmXEA_XZXH6CdbrmjUzyTbVgxu22C8KYH7NsxKbRt94"
Permissions-Policy: ch-ua-arch=*, ch-ua-bitness=*, ch-ua-full-version=*, ch-ua-full-version-list=*, ch-ua-model=*, ch-ua-platform=*, ch-ua-platform-version=*
Accept-Ch: Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Model, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version
Server: ESF
X-Xss-Protection: 0
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
https://www.google.com/client_204?&atyp=i&biw=1440&bih=849&dpr=1.5&ei=Z4IvYpTtF5LU9AP1nIOICQ
HTTP/2 204 No Content
Content-Type: text/html; charset=UTF-8
Strict-Transport-Security: max-age=31536000
Content-Security-Policy: object-src 'none';base-uri 'self';script-src 'nonce-9KQUw4dRjvKnx/zTrOblTQ==' 'strict-dynamic' 'report-sample' 'unsafe-eval' 'unsafe-inline' https: http:;report-uri https://csp.withgoogle.com/csp/gws/cdt1
Bfcache-Opt-In: unload
Date: Mon, 14 Mar 2022 17:59:10 GMT
Server: gws
Content-Length: 0
X-Xss-Protection: 0
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2022-03-14-17; expires=Wed, 13-Apr-2022 17:59:10 GMT; path=/; domain=.google.com; Secure; SameSite=none
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

基本上，我希望能够单独评估从上述代码生成的数据，以便我可以检查以确保 header 值在 each[=31= 中] 来自网站的回应。因此，在此示例中，代码将首先检查第一个网站 (youtube) 生成的 header 集，然后说，所有 header 看起来都不错。然后检查从第二个网站 (google) 生成的 header 集，并说缺少 Strict-Transport-Security header（例如）。这段代码的目标是，无论初始字符串中加载了多少，它都能够运行通过这些网站响应进行验证，并告诉我是否缺少任何 header。

有没有简单的方法来做到这一点？我想在某些时候每个网站的每个输出（headers 的列表）都会被保存到可以是 referenced/called 的变量中。也许这会变得混乱并且不容易做到 - 不确定！如果有更有效的方法来做我想做的事情，也很乐意接受任何让这段代码更清晰的建议。

谢谢！

下面的完整 XML 字符串：

<?xml version='1.0' encoding='utf8'?> <items burpVersion="2022.2.3" exportTime="Mon Mar 14 14:28:18 EDT 2022"> <item> <time>Mon Mar 14 13:59:37 EDT 2022</time> <url>https://www.youtube.com/sw.js_data</url> <host ip="142.250.190.142">www.youtube.com</host> <port>443</port> <protocol>https</protocol> <method>GET</method> <path>/sw.js_data</path> <extension>null</extension> <request base64="false">GET /sw.js_data HTTP/2 Host: www.youtube.com Accept: */* Sec-Fetch-Site: same-origin Sec-Fetch-Mode: cors Sec-Fetch-Dest: empty Referer: https://www.youtube.com/sw.js Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.9 </request> <status>200</status> <responselength>3524</responselength> <mimetype>JSON</mimetype> <response base64="false">HTTP/2 200 OK Content-Type: application/json; charset=utf-8 X-Content-Type-Options: nosniff Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: Mon, 01 Jan 1990 00:00:00 GMT Date: Mon, 14 Mar 2022 17:59:34 GMT Content-Disposition: attachment; filename="response.bin"; filename*=UTF-8''response.bin Strict-Transport-Security: max-age=31536000 X-Frame-Options: SAMEORIGIN Cross-Origin-Opener-Policy-Report-Only: same-origin; report-to="ATmXEA_XZXH6CdbrmjUzyTbVgxu22C8KYH7NsxKbRt94" Permissions-Policy: ch-ua-arch=*, ch-ua-bitness=*, ch-ua-full-version=*, ch-ua-full-version-list=*, ch-ua-model=*, ch-ua-platform=*, ch-ua-platform-version=* Accept-Ch: Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Model, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version Server: ESF X-Xss-Protection: 0 Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43" )]}' [["yt.sw.adr",null,[[["en","US","US","75.188.116.252",null,null,1,null,[],null,null,"","",null,null,"","QUFFLUhqbnREclEzblJmc25GVF9XSXQ1dFZQSm9sRGlmQXxBQ3Jtc0tuU3huS1RoOHQyaFlqN0dLdm4wcGMweXp0OURWQU5RbEJKRko1TlhGYjBoZ3N1Nnpla3QxUFRkN19uaWxoQVZTV0FRUGh0cUw2ckRWbmh5bGhxYkRjNFc2cUREbjB4MnFxMEpval9HUXNZeWU5d1Ztaw\u003d\u003d","CgtaVS1FWnl4ZTJEZyiGhb6RBg%3D%3D"],"Vf114d778||"]]</response> <comment /> </item> <item> <time>Mon Mar 14 13:59:14 EDT 2022</time> <url>https://www.google.com/client_204?&atyp=i&biw=1440&bih=849&dpr=1.5&ei=Z4IvYpTtF5LU9AP1nIOICQ</url> <host ip="172.217.4.36">www.google.com</host> <port>443</port> <protocol>https</protocol> <method>GET</method> <path>/client_204?&atyp=i&biw=1440&bih=849&dpr=1.5&ei=Z4IvYpTtF5LU9AP1nIOICQ</path> <extension>null</extension> <request base64="false">GET /client_204?&atyp=i&biw=1440&bih=849&dpr=1.5&ei=Z4IvYpTtF5LU9AP1nIOICQ HTTP/2 Host: www.google.com Sec-Ch-Ua: "(Not(A:Brand";v="8", "Chromium";v="99" Sec-Ch-Ua-Mobile: ?0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36 Sec-Ch-Ua-Arch: "x86" Sec-Ch-Ua-Full-Version: "99.0.4844.51" Sec-Ch-Ua-Platform-Version: "10.0.0" Sec-Ch-Ua-Bitness: "64" Sec-Ch-Ua-Model: Sec-Ch-Ua-Platform: "Windows" Accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8 X-Client-Data: CJDnygE= Sec-Fetch-Site: same-origin Sec-Fetch-Mode: no-cors Sec-Fetch-Dest: image Referer: https://www.google.com/ Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.9 </request> <status>204</status> <responselength>781</responselength> <mimetype /> <response base64="false">HTTP/2 204 No Content Content-Type: text/html; charset=UTF-8 Strict-Transport-Security: max-age=31536000 Content-Security-Policy: object-src 'none';base-uri 'self';script-src 'nonce-9KQUw4dRjvKnx/zTrOblTQ==' 'strict-dynamic' 'report-sample' 'unsafe-eval' 'unsafe-inline' https: http:;report-uri https://csp.withgoogle.com/csp/gws/cdt1 Bfcache-Opt-In: unload Date: Mon, 14 Mar 2022 17:59:10 GMT Server: gws Content-Length: 0 X-Xss-Protection: 0 X-Frame-Options: SAMEORIGIN Set-Cookie: 1P_JAR=2022-03-14-17; expires=Wed, 13-Apr-2022 17:59:10 GMT; path=/; domain=.google.com; Secure; SameSite=none Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43" </response> <comment /> </item> </items>

更新：过去几天一直在修改代码，但仍然没有成功。欢迎任何想法！

Answer 1

简单地将输出保存到单个多个项目的字典变量中。由于您的文本拆分需要多个步骤，请考虑使用定义的方法。

# DEFINED METHOD TO SPLIT RESPONSE BY LINE BREAKS
def split_text(resp): 
    responses = resp.split("\n")
    index = responses.index('') 
    indexed = responses[:index]

    return indexed

# PARSE XML FILE
doc = ET.fromstring(new)

# RETRIEVE ITEM NODES WITH DICTIONARY COMPREHENSION
website_items = {
    item.find("url").text: split_text(item.find("response").text)
    for item in doc.findall(".//item")
}

# REVIEW SAVED DATA WITH URLS AS KEYS
website_items["https://www.youtube.com/sw.js_data"]
website_items["https://www.google.com/client_204?&amp;atyp=i&amp;biw=1440&amp;bih=849&amp;dpr=1.5&amp;ei=Z4IvYpTtF5LU9AP1nIOICQ"]

Python - 如何将循环的输出保存到多个可调用变量

Python - How to save output from loop to multiple callable variables

xml

parsing

for-loop