使用 URL.openConnection() 时,处理 URL 变体(如 "www" 和 "https" 的最佳方法是什么?

When using URL.openConnection(), what is the best way to handle URL variations like "www" and "https"?

打开连接时,我怎样才能找到最好的URL格式来使用?

许多站点 return 根据 URL 是否使用 "www" and/or "https".

不同的结果

例如,这是我为查看一些不同结果而编写的测试:

import java.util.Scanner;
import java.util.ArrayList;
import java.net.*;
import java.io.*;

public class Test {

    public static void main(String[] args)
   {
      String baseURL = "google.com";

      try
      {
         java.net.URL url = new java.net.URL("http://" + baseURL);
         java.net.URLConnection connection = url.openConnection();
         connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
         BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

         String line;
         int lineCount = 0;

         while ((line = in.readLine()) != null)
         {
            lineCount++;
         }

         System.out.println("http://" + baseURL + " = " + lineCount + " lines");
      }

      catch (Exception ex)
      {
         System.out.println("http://" + baseURL + " throws an error");
      }



      try
      {
         java.net.URL url = new java.net.URL("http://www." + baseURL);
         java.net.URLConnection connection = url.openConnection();
         connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
         BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

         String line;
         int lineCount = 0;

         while ((line = in.readLine()) != null)
         {
          lineCount++;
         }

         System.out.println("http://www." + baseURL + " = " + lineCount + " lines");
      }

      catch(Exception ex)
      {
         System.out.println("http://www." + baseURL + " throws an error");
      }







      try
      {
         java.net.URL url = new java.net.URL("https://" + baseURL);
         java.net.URLConnection connection = url.openConnection();
         connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
         BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

         String line;
         int lineCount = 0;

         while ((line = in.readLine()) != null)
         {
            lineCount++;
         }

         System.out.println("https://" + baseURL + " = " + lineCount + " lines");
      }

      catch (Exception ex)
      {
         System.out.println("https://" + baseURL + " throws an error");
      }



      try
      {
         java.net.URL url = new java.net.URL("https://www." + baseURL);
         java.net.URLConnection connection = url.openConnection();
         connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
         BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

         String line;
         int lineCount = 0;

         while ((line = in.readLine()) != null)
         {
            lineCount++;
         }

         System.out.println("https://www." + baseURL + " = " + lineCount + " lines");
      }

      catch (Exception ex)
      {
         System.out.println("https://www." + baseURL + " throws an error");
      }
   }
}

以下是 运行 它在四个不同网站上的结果:

http://whosebug.com = 4205 lines
http://www.whosebug.com = 4205 lines
https://whosebug.com = 4205 lines
https://www.whosebug.com = 2 lines

http://qvc.com = 2438 lines
http://www.qvc.com = 2438 lines
https://qvc.com throws an error
https://www.qvc.com = 0 lines

http://facebook.com = 0 lines
http://www.facebook.com = 0 lines
https://facebook.com = 25 lines
https://www.facebook.com = 25 lines

http://google.com = 6 lines
http://www.google.com = 6 lines
https://google.com = 343 lines
https://www.google.com = 343 lines

给定基础 URL,例如 "google.com",检查我应该为网站使用哪种格式的正确方法是什么?

检查 HTTP 响应代码。如果您收到重定向,那么您可能使用了错误的格式。例如http://www.whosebug.com 将 301 重定向到 http://whosebug.com.

阅读 Marc B 的回答、其他一些 Whosebug 线程(我在原始问题的评论中链接)和 this guide 后,我得出以下结论:

String baseURL = "google.com";

try
{
     java.net.URL url = new java.net.URL("http://" + baseURL);
     java.net.HttpURLConnection connection = (HttpURLConnection)url.openConnection();
     connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");

     int response = connection.getResponseCode();
     System.out.println("Response code: " + response);

     if (response == 301 || response == 302 || response == 303)
     {
            System.out.println("Redirect location: " + connection.getHeaderField("Location"));
     }

     else
     {
            BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

            String line;
            int lineCount = 0;

            while ((line = in.readLine()) != null)
            {
            lineCount++;
            }

            System.out.println("http://" + baseURL + " = " + lineCount + " lines\n");
     }
}

catch (Exception ex)
{
     System.out.println("http://" + baseURL + " throws an error\n");
}

输出这个:

Response code: 302
Redirect location: https://www.google.com/?gws_rd=ssl

您也可以使用 HttpURLConnection.HTTP_MOVED_TEMPHttpURLConnection.HTTP_MOVED_PERMHttpURLConnection.HTTP_SEE_OTHER 代替数字响应代码。实际上,这可能是更好的做法。