使用 URL.openConnection() 时,处理 URL 变体(如 "www" 和 "https" 的最佳方法是什么?
When using URL.openConnection(), what is the best way to handle URL variations like "www" and "https"?
打开连接时,我怎样才能找到最好的URL格式来使用?
许多站点 return 根据 URL 是否使用 "www" and/or "https".
不同的结果
例如,这是我为查看一些不同结果而编写的测试:
import java.util.Scanner;
import java.util.ArrayList;
import java.net.*;
import java.io.*;
public class Test {
public static void main(String[] args)
{
String baseURL = "google.com";
try
{
java.net.URL url = new java.net.URL("http://" + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("http://" + baseURL + " = " + lineCount + " lines");
}
catch (Exception ex)
{
System.out.println("http://" + baseURL + " throws an error");
}
try
{
java.net.URL url = new java.net.URL("http://www." + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("http://www." + baseURL + " = " + lineCount + " lines");
}
catch(Exception ex)
{
System.out.println("http://www." + baseURL + " throws an error");
}
try
{
java.net.URL url = new java.net.URL("https://" + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("https://" + baseURL + " = " + lineCount + " lines");
}
catch (Exception ex)
{
System.out.println("https://" + baseURL + " throws an error");
}
try
{
java.net.URL url = new java.net.URL("https://www." + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("https://www." + baseURL + " = " + lineCount + " lines");
}
catch (Exception ex)
{
System.out.println("https://www." + baseURL + " throws an error");
}
}
}
以下是 运行 它在四个不同网站上的结果:
http://whosebug.com = 4205 lines
http://www.whosebug.com = 4205 lines
https://whosebug.com = 4205 lines
https://www.whosebug.com = 2 lines
http://qvc.com = 2438 lines
http://www.qvc.com = 2438 lines
https://qvc.com throws an error
https://www.qvc.com = 0 lines
http://facebook.com = 0 lines
http://www.facebook.com = 0 lines
https://facebook.com = 25 lines
https://www.facebook.com = 25 lines
http://google.com = 6 lines
http://www.google.com = 6 lines
https://google.com = 343 lines
https://www.google.com = 343 lines
给定基础 URL,例如 "google.com",检查我应该为网站使用哪种格式的正确方法是什么?
检查 HTTP 响应代码。如果您收到重定向,那么您可能使用了错误的格式。例如http://www.whosebug.com
将 301 重定向到 http://whosebug.com
.
阅读 Marc B 的回答、其他一些 Whosebug 线程(我在原始问题的评论中链接)和 this guide 后,我得出以下结论:
String baseURL = "google.com";
try
{
java.net.URL url = new java.net.URL("http://" + baseURL);
java.net.HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
int response = connection.getResponseCode();
System.out.println("Response code: " + response);
if (response == 301 || response == 302 || response == 303)
{
System.out.println("Redirect location: " + connection.getHeaderField("Location"));
}
else
{
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("http://" + baseURL + " = " + lineCount + " lines\n");
}
}
catch (Exception ex)
{
System.out.println("http://" + baseURL + " throws an error\n");
}
输出这个:
Response code: 302
Redirect location: https://www.google.com/?gws_rd=ssl
您也可以使用 HttpURLConnection.HTTP_MOVED_TEMP
、HttpURLConnection.HTTP_MOVED_PERM
和 HttpURLConnection.HTTP_SEE_OTHER
代替数字响应代码。实际上,这可能是更好的做法。
打开连接时,我怎样才能找到最好的URL格式来使用?
许多站点 return 根据 URL 是否使用 "www" and/or "https".
不同的结果例如,这是我为查看一些不同结果而编写的测试:
import java.util.Scanner;
import java.util.ArrayList;
import java.net.*;
import java.io.*;
public class Test {
public static void main(String[] args)
{
String baseURL = "google.com";
try
{
java.net.URL url = new java.net.URL("http://" + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("http://" + baseURL + " = " + lineCount + " lines");
}
catch (Exception ex)
{
System.out.println("http://" + baseURL + " throws an error");
}
try
{
java.net.URL url = new java.net.URL("http://www." + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("http://www." + baseURL + " = " + lineCount + " lines");
}
catch(Exception ex)
{
System.out.println("http://www." + baseURL + " throws an error");
}
try
{
java.net.URL url = new java.net.URL("https://" + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("https://" + baseURL + " = " + lineCount + " lines");
}
catch (Exception ex)
{
System.out.println("https://" + baseURL + " throws an error");
}
try
{
java.net.URL url = new java.net.URL("https://www." + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("https://www." + baseURL + " = " + lineCount + " lines");
}
catch (Exception ex)
{
System.out.println("https://www." + baseURL + " throws an error");
}
}
}
以下是 运行 它在四个不同网站上的结果:
http://whosebug.com = 4205 lines
http://www.whosebug.com = 4205 lines
https://whosebug.com = 4205 lines
https://www.whosebug.com = 2 lines
http://qvc.com = 2438 lines
http://www.qvc.com = 2438 lines
https://qvc.com throws an error
https://www.qvc.com = 0 lines
http://facebook.com = 0 lines
http://www.facebook.com = 0 lines
https://facebook.com = 25 lines
https://www.facebook.com = 25 lines
http://google.com = 6 lines
http://www.google.com = 6 lines
https://google.com = 343 lines
https://www.google.com = 343 lines
给定基础 URL,例如 "google.com",检查我应该为网站使用哪种格式的正确方法是什么?
检查 HTTP 响应代码。如果您收到重定向,那么您可能使用了错误的格式。例如http://www.whosebug.com
将 301 重定向到 http://whosebug.com
.
阅读 Marc B 的回答、其他一些 Whosebug 线程(我在原始问题的评论中链接)和 this guide 后,我得出以下结论:
String baseURL = "google.com";
try
{
java.net.URL url = new java.net.URL("http://" + baseURL);
java.net.HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
int response = connection.getResponseCode();
System.out.println("Response code: " + response);
if (response == 301 || response == 302 || response == 303)
{
System.out.println("Redirect location: " + connection.getHeaderField("Location"));
}
else
{
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("http://" + baseURL + " = " + lineCount + " lines\n");
}
}
catch (Exception ex)
{
System.out.println("http://" + baseURL + " throws an error\n");
}
输出这个:
Response code: 302
Redirect location: https://www.google.com/?gws_rd=ssl
您也可以使用 HttpURLConnection.HTTP_MOVED_TEMP
、HttpURLConnection.HTTP_MOVED_PERM
和 HttpURLConnection.HTTP_SEE_OTHER
代替数字响应代码。实际上,这可能是更好的做法。