如何从 python 中的动态下拉列表中 Extract/Scrape 选项值？

Question

我正在尝试从网页中提取数据，其中下拉列表中的选项是根据我们的输入动态加载的。我正在使用 Selenium Webdriver 从下拉列表中提取数据。请参阅下面的屏幕截图。

Dropdown 1 - State

Dropdown 2 - City

Dropdown 3 - Station

城市下拉选项在 select 州和车站下拉列表在 select 城市后加载。

到目前为止，我已经能够使用此代码提取电台名称。

citiesList = []
stationNameList = []
siteIdList = []

for city in cityOptions[1:]:
    citiesList.append(city.text)

stationDropDown = driver.find_element_by_xpath("//select[contains(@id,'stations')]")
stationOptions = stationDropDown.find_elements_by_tag_name('option')

 
      for ele in citiesList:
            cityDropdown.send_keys(ele, Keys.RETURN)
            time.sleep(2)
            stationDropDown.click()
            print(stationDropDown.text)

State Options

City Options

Option values from station dropdown

任何人都可以帮我提取每个州和城市的 siteId 吗？

Answer 1

尝试使用以下方法 python - requests 简单、直接、可靠、快速且在涉及 [=21= 时需要更少的代码]请求。在检查 google chrome 浏览器的网络部分后，我从网站本身获取了 API URL。

下面的脚本到底在做什么：

首先需要 API URL 和负载 （非常重要的是做一个 POST 请求） 做一个 POST请求并获取return.
获取数据后，脚本将使用 json.loads 库解析 JSON 数据。
最后它会一个一个地遍历整个站点列表并打印详细信息，例如州名称、城市名称、站点名称和站点 ID。

Network call tab

Output of below code.

def scrape_aqi_site_id():
URL = 'https://app.cpcbccr.com/aqi_dashboard/aqi_station_all_india' #API URL
payload = 'eyJ0aW1lIjoxNjAzMTA0NTczNDYzLCJ0aW1lWm9uZU9mZnNldCI6LTMzMH0=' #Unique payload fetched from the network request
response = requests.post(URL,data=payload,verify=False) #POST request to get the data using URL and Payload information
result = json.loads(response.text) # parse the JSON object using json library
extracted_states = result['stations'] 
for state in range(len(extracted_states)): # loop over extracted states and its stations data.
    print('=' * 120)
    print('Scraping station data for state : ' + extracted_states[state]['stateID'])
    for station in range(len(extracted_states[state]['stationsInCity'])): # loop over each state station data to get the information of stations
        print('-' * 100)
        print('Scraping data for city and its station : City (' + extracted_states[state]['stationsInCity'][station]['cityID'] + ') & station (' + extracted_states[state]['stationsInCity'][station]['name'] + ')')
        print('City :' + extracted_states[state]['stationsInCity'][station]['cityID'])
        print('Station Name : ' + extracted_states[state]['stationsInCity'][station]['name'])
        print('Station Site Id : ' + extracted_states[state]['stationsInCity'][station]['id'])
        print('-' * 100)        
    print('Scraping of data for state : (' + extracted_states[state]['stateID'] + ') is conmpleted now going for another one...')
    print('=' * 120)

scrape_aqi_site_id()

如何从 python 中的动态下拉列表中 Extract/Scrape 选项值？

How to Extract/Scrape option values from dynamic dropdowns in python?

python

selenium

webdriver

web-scraping