如何使用 Selenium 和 Python 在 html 代码中提取特定文本
How can I extract a specific text in an html code with Selenium and Python
<time class="_1o9PC Nzb55" datetime="2020-06-07T17:45:25.000Z" title="7. Juni 2020">Vor 1 Stunde</time>
我目前正在使用 selenium 进行网页抓取。您看到的代码是图片发布到 Instagram 时的 html 元素。
我希望代码只打印这个:
datetime="2020-06-07T17:45:25.000Z"
假设我通过 class 找到元素并执行 print(element.text)
。
然后它输出:"Vor 1 Stunde"(抱歉是德语)。
我不知道是否有办法做到这一点,但如果有,请告诉我。
这是完整的代码:
from selenium import webdriver
import time, pyautogui, random
browser = webdriver.Firefox()
browser.get('https://www.instagram.com/')
time.sleep(1)
name = browser.find_element_by_xpath("/html/body/div[1]/section/main/article/div[2]/div[1]/div/form/div[2]/div/label/input")
name.click()
name.send_keys("username")
passwort = browser.find_element_by_xpath("/html/body/div[1]/section/main/article/div[2]/div[1]/div/form/div[3]/div/label/input")
passwort.send_keys("password")
browser.find_element_by_xpath("/html/body/div[1]/section/main/article/div[2]/div[1]/div/form/div[4]/button/div").click()
time.sleep(3)
browser.find_element_by_xpath("/html/body/div[1]/section/main/div/div/div/div/button").click()
time.sleep(2)
browser.find_element_by_xpath("/html/body/div[4]/div/div/div[3]/button[2]").click()
time.sleep(2)
suche = browser.find_element_by_class_name("LWmhU").click()
time.sleep(1)
pyautogui.typewrite("mmd")
pyautogui.typewrite(["enter"])
time.sleep(2.5)
acc = browser.find_element_by_xpath("/html/body/div[1]/section/nav/div[2]/div/div/div[2]/div[2]/div[2]/div/a[1]/div/div[2]/span").click()
print(acc)
time.sleep(1)
# click on the instagram picture
pyautogui.click(427, 754)
time.sleep(2)
uploaddate = browser.find_element_by_class_name("_1o9PC")
print(uploaddate.getAttribute("datetime"))
所需的元素是 ReactJS enabled element so to locate the element you need to induce for the visibility_of_element_located()
and you can use either of the following :
使用XPATH
:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//time[text()='Vor 1 Stunde']"))).get_attribute("datetime"))
使用CSS_SELECTOR
:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "time[title$='Juni 2020'][datetime]"))).get_attribute("datetime"))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
<time class="_1o9PC Nzb55" datetime="2020-06-07T17:45:25.000Z" title="7. Juni 2020">Vor 1 Stunde</time>
我目前正在使用 selenium 进行网页抓取。您看到的代码是图片发布到 Instagram 时的 html 元素。 我希望代码只打印这个:
datetime="2020-06-07T17:45:25.000Z"
假设我通过 class 找到元素并执行 print(element.text)
。
然后它输出:"Vor 1 Stunde"(抱歉是德语)。
我不知道是否有办法做到这一点,但如果有,请告诉我。
这是完整的代码:
from selenium import webdriver
import time, pyautogui, random
browser = webdriver.Firefox()
browser.get('https://www.instagram.com/')
time.sleep(1)
name = browser.find_element_by_xpath("/html/body/div[1]/section/main/article/div[2]/div[1]/div/form/div[2]/div/label/input")
name.click()
name.send_keys("username")
passwort = browser.find_element_by_xpath("/html/body/div[1]/section/main/article/div[2]/div[1]/div/form/div[3]/div/label/input")
passwort.send_keys("password")
browser.find_element_by_xpath("/html/body/div[1]/section/main/article/div[2]/div[1]/div/form/div[4]/button/div").click()
time.sleep(3)
browser.find_element_by_xpath("/html/body/div[1]/section/main/div/div/div/div/button").click()
time.sleep(2)
browser.find_element_by_xpath("/html/body/div[4]/div/div/div[3]/button[2]").click()
time.sleep(2)
suche = browser.find_element_by_class_name("LWmhU").click()
time.sleep(1)
pyautogui.typewrite("mmd")
pyautogui.typewrite(["enter"])
time.sleep(2.5)
acc = browser.find_element_by_xpath("/html/body/div[1]/section/nav/div[2]/div/div/div[2]/div[2]/div[2]/div/a[1]/div/div[2]/span").click()
print(acc)
time.sleep(1)
# click on the instagram picture
pyautogui.click(427, 754)
time.sleep(2)
uploaddate = browser.find_element_by_class_name("_1o9PC")
print(uploaddate.getAttribute("datetime"))
所需的元素是 ReactJS enabled element so to locate the element you need to induce visibility_of_element_located()
and you can use either of the following
使用
XPATH
:print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//time[text()='Vor 1 Stunde']"))).get_attribute("datetime"))
使用
CSS_SELECTOR
:print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "time[title$='Juni 2020'][datetime]"))).get_attribute("datetime"))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC