使用 Selenium 获取标签 'h1' 和 id 内的信息
Using Selenium to get information within the tag 'h1' and id
我正在尝试获取以下信息:'Jarrow Formulas, Methyl Folate, 400 mcg, 60 Veggie Caps'
可以看看图片,非常感谢:
我使用了这段代码,但没有成功:
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.iherb.com/c/Vitamin-B?sr=2")
wait = WebDriverWait(driver, 10)
item_name = list()
#close the pop up
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"svg[data-ga-event-action='list-close']"))).click()
#store all the links in a list
item_links = [item.get_attribute("href") for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".absolute-link-wrapper > a.product-link")))]
for item_link in item_links:
driver.get(item_link)item_name.append(driver.find_element_by_css_selector('[id="name"]').text) #this code doesnt work
要打印 text value
您可以使用以下任一方法 :
使用 xpath
和 text 属性:
print(driver.find_element_by_xpath("//section[@class='column image-fixed']//following::section[2]//div[@id='product-summary-header']//h1[@id='name']").text)
使用 xpath
和 get_attribute()
:
print(driver.find_element_by_xpath("//section[@class='column image-fixed']//following::section[2]//div[@id='product-summary-header']//h1[@id='name']").get_attribute("innerHTML"))
控制台输出:
Jarrow Formulas, Methyl Folate, 400 mcg, 60 Veggie Caps
理想情况下,您需要为 visibility_of_element_located()
引入 ,您可以使用以下任一项 :
使用 xpath
和 text 属性:
driver.get('https://ca.iherb.com/pr/Jarrow-Formulas-Methyl-Folate-400-mcg-60-Veggie-Caps/42778')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//section[@class='column image-fixed']//following::section[2]//div[@id='product-summary-header']//h1[@id='name']"))).text)
使用 XPATH
和 get_attribute()
:
driver.get('https://ca.iherb.com/pr/Jarrow-Formulas-Methyl-Folate-400-mcg-60-Veggie-Caps/42778')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//section[@class='column image-fixed']//following::section[2]//div[@id='product-summary-header']//h1[@id='name']"))).get_attribute("innerHTML"))
控制台输出:
Jarrow Formulas, Methyl Folate, 400 mcg, 60 Veggie Caps
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
参考资料
Link 到有用的文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium
我正在尝试获取以下信息:'Jarrow Formulas, Methyl Folate, 400 mcg, 60 Veggie Caps'
可以看看图片,非常感谢:
我使用了这段代码,但没有成功:
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.iherb.com/c/Vitamin-B?sr=2")
wait = WebDriverWait(driver, 10)
item_name = list()
#close the pop up
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"svg[data-ga-event-action='list-close']"))).click()
#store all the links in a list
item_links = [item.get_attribute("href") for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".absolute-link-wrapper > a.product-link")))]
for item_link in item_links:
driver.get(item_link)item_name.append(driver.find_element_by_css_selector('[id="name"]').text) #this code doesnt work
要打印 text value
您可以使用以下任一方法
使用
xpath
和 text 属性:print(driver.find_element_by_xpath("//section[@class='column image-fixed']//following::section[2]//div[@id='product-summary-header']//h1[@id='name']").text)
使用
xpath
和get_attribute()
:print(driver.find_element_by_xpath("//section[@class='column image-fixed']//following::section[2]//div[@id='product-summary-header']//h1[@id='name']").get_attribute("innerHTML"))
控制台输出:
Jarrow Formulas, Methyl Folate, 400 mcg, 60 Veggie Caps
理想情况下,您需要为 visibility_of_element_located()
引入
使用
xpath
和 text 属性:driver.get('https://ca.iherb.com/pr/Jarrow-Formulas-Methyl-Folate-400-mcg-60-Veggie-Caps/42778') print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//section[@class='column image-fixed']//following::section[2]//div[@id='product-summary-header']//h1[@id='name']"))).text)
使用
XPATH
和get_attribute()
:driver.get('https://ca.iherb.com/pr/Jarrow-Formulas-Methyl-Folate-400-mcg-60-Veggie-Caps/42778') print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//section[@class='column image-fixed']//following::section[2]//div[@id='product-summary-header']//h1[@id='name']"))).get_attribute("innerHTML"))
控制台输出:
Jarrow Formulas, Methyl Folate, 400 mcg, 60 Veggie Caps
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
参考资料
Link 到有用的文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium