Python 抓取外部 IP 地址的程序

Question

我创建了一个基本程序来尝试使用 BeautifulSoup 4 抓取我的外部 IP 地址的网站。虽然，我的程序一直出现属性错误，因为它无法获取 div class 或其他任何字符串。它会显示为特定 div class 不存在，因此无法抓取它。我确实知道它存在的事实，即使它说它不存在。有谁知道哪里出了问题吗？

这是我的代码：

import requests, sys, io
from html.parser import HTMLParser
from bs4 import BeautifulSoup

url = "https://www.iplocation.net/find-ip-address"
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, "cp437", "backslashreplace")
sourcecode = requests.get(url)
plaintext = sourcecode.text
soup = BeautifulSoup(plaintext, "html.parser")

tag = soup.find("span", {"style": "font-weight: bold; color:green;"})
print(tag)
ip = tag.string
print(ip)

Answer 1

与Javascript无关，查看返回源可知：

<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe src="/_Incapsula_Resource?CWUDNSAI=24&xinfo=9-52943897-0 0NNN RT(1471643127529 69) q(0 -1 -1 -1) r(0 -1) B12(8,881022,0) U10000&incident_id=198001480102412051-472966643371608393&edet=12&cinfo=08000000" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 198001480102412051-472966643371608393</iframe></body></html>

他们检测到您是一个机器人并且没有给您您期望的来源。

您可以使用 wtfismyip.com 以 json 格式获取您的 IP 和更多信息：

url = "http://wtfismyip.com/json"
js = requests.get(url).json()
print(js)

或者只是你的 ip 使用 httpbin:

url = "http://httpbin.org/ip"
js = requests.get(url).json()
print(js)

Answer 2

如上所述，在服务器上放置了机器人检测机制，如果您尝试执行 requests.get 那么它 return "Request unsuccessful. Incapsula incident ID: 415000500153648966-193432437842182947" 并且由于未加载源代码，您不能找到所需的信息。如果你想用 beautifulsoup 来做，在 selenium 和 beautifulsoup 的帮助下你可以得到它，这里是示例代码：

如果没有安装 selenium，那么首先 "pip install selenium" 并从“https://sites.google.com/a/chromium.org/chromedriver/downloads”

下载 chromedriver

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome("**Path to chrome driver**\chromedriver.exe")
driver.get('https://www.iplocation.net/find-ip-address')
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
tag = soup.find("span", {"style": "font-weight: bold; color:green;"}).text
print(tag)

它将打印：xxx.xx.xxx.xxx

注意：有时当您第一次在新机器上启动脚本时，它可能会要求输入验证码，手动输入，然后脚本就会运行

Python 抓取外部 IP 地址的程序

Python program that crawls for external IP address

python

ip

beautifulsoup

web-crawler

python-3.x