如何在此网页上从 Tableau 中提取值

How can I extract values from Tableau on this webpage

我正在尝试从此网页中提取每个州和县的“流动性指数”值: https://www.cuebiq.com/visitation-insights-mobility-index/

首选输出将是所有可用地点和日期的地点面板数据 (state/county)。

还有另一个线程 () 有类似的问题。我尝试按照那里的解决方案进行操作,但它似乎对我的情况不起作用。

非常感谢。

(我尝试过的一种方法是下载从 Tableau 生成的 PDF 文件,其中包含所有县在特定日期的值。但是,我仍然需要找到一种方法来请求每个日期数据。无论如何,如果你有比这条路线更好的主意,请告诉我)。

此表格数据 url 没有 return 任何数据。事实上,它只渲染值的图像(可能 canvas),我猜它会根据坐标检测点击。大概是为了缓存值和快速渲染而做的吧。

但是当你点击一个州时,它实际上 returns 数据但它似乎并不总是 returns 州的结果(但适用于个别县)。

我找到的解决方案是使用工具提示获取状态数据。单击状态时,它会生成如下请求:

POST https://public.tableau.com/{path}/{session_id}/commands/tabsrv/render-tooltip-server

使用以下形式参数:

worksheet: US Map - State - CMI
dashboard: CMI
tupleIds: [18]
vizRegionRect: {"r":"viz","x":496,"y":148,"w":0,"h":0,"fieldVector":null}
allowHoverActions: false
allowPromptText: true
allowWork: false
useInlineImages: true

其中 tupleIds: [18] 指的是州列表中的州索引,按字母顺序逆序排列,如下所示:

stateNames = ["Wyoming","Wisconsin","West Virginia","Washington","Virginia","Vermont","Utah","Texas","Tennessee","South Dakota","South Carolina","Rhode Island","Pennsylvania","Oregon","Oklahoma","Ohio","North Dakota","North Carolina","New York","New Mexico","New Jersey","New Hampshire","Nevada","Nebraska","Montana","Missouri","Mississippi","Minnesota","Michigan","Massachusetts","Maryland","Maine","Louisiana","Kentucky","Kansas","Iowa","Indiana","Illinois","Idaho","Georgia","Florida","District of Columbia","Delaware","Connecticut","Colorado","California","Arkansas","Arizona","Alabama"]

它给出了一个 json 和工具提示的 html,其中包含您要提取的 CMI 和 YoY 值:

{
    "vqlCmdResponse": {
        "cmdResultList": [{
            "commandName": "tabsrv:render-tooltip-server",
            "commandReturn": {
                "tooltipText": "{\"htmlTooltip\": \"<HTML HERE WITH THE VALUES>\"}]},\"overlayAnchors\":[]}"
            }
        }]
    }
}

唯一需要注意的是,您必须为每个州提出一个请求:

import requests
from bs4 import BeautifulSoup
import json
import time

data_host = "https://public.tableau.com"

r = requests.get(
    f"{data_host}/views/CMI-2_0/CMI",
    params= {
        ":showVizHome":"no",
    }
)
soup = BeautifulSoup(r.text, "html.parser")

tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

dataUrl = f'{data_host}{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'

r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})
data = []

stateNames = ["Wyoming","Wisconsin","West Virginia","Washington","Virginia","Vermont","Utah","Texas","Tennessee","South Dakota","South Carolina","Rhode Island","Pennsylvania","Oregon","Oklahoma","Ohio","North Dakota","North Carolina","New York","New Mexico","New Jersey","New Hampshire","Nevada","Nebraska","Montana","Missouri","Mississippi","Minnesota","Michigan","Massachusetts","Maryland","Maine","Louisiana","Kentucky","Kansas","Iowa","Indiana","Illinois","Idaho","Georgia","Florida","District of Columbia","Delaware","Connecticut","Colorado","California","Arkansas","Arizona","Alabama"]

for stateIndex, state in enumerate(stateNames):
    time.sleep(0.5) #for throttling
    r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabsrv/render-tooltip-server',
        data = {
        "worksheet": "US Map - State - CMI",
        "dashboard": "CMI",
        "tupleIds": f"[{stateIndex+1}]",
        "vizRegionRect": json.dumps({"r":"viz","x":496,"y":148,"w":0,"h":0,"fieldVector":None}),
        "allowHoverActions": "false",
        "allowPromptText": "true",
        "allowWork": "false",
        "useInlineImages": "true"
    })
    tooltip = json.loads(r.json()["vqlCmdResponse"]["cmdResultList"][0]["commandReturn"]["tooltipText"])["htmlTooltip"]
    soup = BeautifulSoup(tooltip, "html.parser")
    rows = [ 
        t.find("tr").find_all("td")
        for t in soup.find_all("table")
    ]
    entry = { "state": state }
    for row in rows:
        if (row[0].text == "Mobility Index:"):
            entry["CMI"] = "".join([t.text.strip() for t in row[1:]])
        if row[0].text == "YoY (%):":
            entry["YoY"] = "".join([t.text.strip() for t in row[1:]])
    print(entry)
    data.append(entry)

print(data)

Try this on repl.it

要获取县信息,它与使用 select 端点的 相同,它为您提供与您在问题中链接的 post 相同格式的数据

以下将提取所有县和州的数据:

import requests
from bs4 import BeautifulSoup
import json
import time

data_host = "https://public.tableau.com"
worksheet = "US Map - State - CMI"
dashboard = "CMI"

r = requests.get(
    f"{data_host}/views/CMI-2_0/CMI",
    params= {
        ":showVizHome":"no",
    }
)
soup = BeautifulSoup(r.text, "html.parser")

tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

dataUrl = f'{data_host}{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'

r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})
data = []

stateNames = ["Wyoming","Wisconsin","West Virginia","Washington","Virginia","Vermont","Utah","Texas","Tennessee","South Dakota","South Carolina","Rhode Island","Pennsylvania","Oregon","Oklahoma","Ohio","North Dakota","North Carolina","New York","New Mexico","New Jersey","New Hampshire","Nevada","Nebraska","Montana","Missouri","Mississippi","Minnesota","Michigan","Massachusetts","Maryland","Maine","Louisiana","Kentucky","Kansas","Iowa","Indiana","Illinois","Idaho","Georgia","Florida","District of Columbia","Delaware","Connecticut","Colorado","California","Arkansas","Arizona","Alabama"]

for stateIndex, state in enumerate(stateNames):
    time.sleep(0.5) #for throttling
    r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabsrv/render-tooltip-server',
        data = {
        "worksheet": worksheet,
        "dashboard": dashboard,
        "tupleIds": f"[{stateIndex+1}]",
        "vizRegionRect": json.dumps({"r":"viz","x":496,"y":148,"w":0,"h":0,"fieldVector":None}),
        "allowHoverActions": "false",
        "allowPromptText": "true",
        "allowWork": "false",
        "useInlineImages": "true"
    })
    tooltip = json.loads(r.json()["vqlCmdResponse"]["cmdResultList"][0]["commandReturn"]["tooltipText"])["htmlTooltip"]
    soup = BeautifulSoup(tooltip, "html.parser")
    rows = [ 
        t.find("tr").find_all("td")
        for t in soup.find_all("table")
    ]
    entry = { "state": state }
    for row in rows:
        if (row[0].text == "Mobility Index:"):
            entry["CMI"] = "".join([t.text.strip() for t in row[1:]])
        if row[0].text == "YoY (%):":
            entry["YoY"] = "".join([t.text.strip() for t in row[1:]])

    r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabdoc/select',
        data = {
        "worksheet": worksheet,
        "dashboard": dashboard,
        "selection": json.dumps({
            "objectIds":[stateIndex+1],
            "selectionType":"tuples"
        }),
        "selectOptions": "select-options-simple"
    })
    entry["county_data"] = r.json()["vqlCmdResponse"]["layoutStatus"]["applicationPresModel"]["dataDictionary"]["dataSegments"]
    print(entry)
    data.append(entry)


print(data)