在 Java 中使用 HTTPClient 下载文件
Download a file with HTTPClient in Java
我正在尝试编写一个 java 程序来登录网站,输入搜索引擎,获取结果,然后下载从结果生成的 excel 文件.到目前为止,我可以正常登录。并发送搜索并获得结果。但是,我在下载 excel 文件时遇到了很多问题。
查看网站的源代码,我在 excel 文件周围看到 Ajax 和 Javascript,所以我假设是 ajax 帮助生成了它。
<input id="toexcel" type="image" src="/websmart/v9.4/XLGP/images/Excel-icon.png" alt="To Excel" title="To Excel: Max 20000 Records" onclick="" />
JavaScript部分:
$( document ).ready(function() {
$('#toexcel').click(function(e) {
e.preventDefault();
setTask('toexcel');
var ajaxForm = $("#filter-form");
$(".spinner").show();
var dataToSend = ajaxForm.serialize();
$("#excelFrame").attr('src','V7BAE01R.pgm' + '?' + dataToSend);
setTimeout(function() {
$(".spinner").hide();
}, 5000 );
使用 TamperData,当我单击 Excel 文件导出时,它会发送一个 post 请求(我设法在代码的最后一部分发送)但我不确定在哪里为拿到它,为实现它。我确实在篡改数据中看到了 Application/vnd.ms-excel
的 Get
我不确定要在代码中添加什么来获取 excel 文件。下面,我尝试使用 BufferReader,但它没有获取我的文件。由于名称值对,我简化了一些代码。
import java.util.List;
import java.util.ArrayList;
import org.apache.http.*;
import java.io.*;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.*;
import org.apache.http.impl.client.*;
import org.apache.http.message.*;
import org.apache.http.util.EntityUtils;
import org.apache.http.client.entity.*;
public class httpClientTest {
public static void main (String[] args) throws ClientProtocolException, IOException {
//Set up HttpClient
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet("http://website");
CloseableHttpResponse response = httpclient.execute(httpGet);
//Create Post request to log into the AS400 website
HttpPost httpPost = new HttpPost("http://loginwebsite");
List <NameValuePair> nvps = new ArrayList <NameValuePair>();
nvps.add(new BasicNameValuePair("user","username"));
nvps.add(new BasicNameValuePair("password","password"));
nvps.add(new BasicNameValuePair("button", "Login"));
nvps.add(new BasicNameValuePair("task", "extlogin"));
httpPost.setEntity(new UrlEncodedFormEntity(nvps));
response = httpclient.execute(httpPost);
//Get Post response to ensure we logged in, which succeeds
try{
System.out.println(response.getStatusLine());
HttpEntity entity = response.getEntity();
EntityUtils.consume(entity);
} finally{
response.close();
}
//Sent a Post request to filters out recoreds.
httpPost = new HttpPost("http://searchresults");
nvps.clear();
nvps.add(new BasicNameValuePair("ActSts", "Edit"));
nvps.add(new BasicNameValuePair("task", "filter"));
nvps.add(new BasicNameValuePair("Field", "Plant"));
response = httpclient.execute(httpPost);
//Displays in printline the html/js of the page. This looks like it DOES display the search results
//So it IS sending the Post request and receiving a response.
BufferedReader rd = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
String line = "";
while ((line = rd.readLine()) != null) {
System.out.println(line);
}
//try to buffer to read in.
String link = "http://website.com/uri?ActSts=Edit&task=filter&Field=Plant";
HttpGet get = new HttpGet(link);
response = httpclient.execute(get);
InputStream is = response.getEntity().getContent();
String filePath = "C:\Users\WindowsUserName\Downloads\WODETAIL_List.xls";
FileOutputStream fos = new FileOutputStream(new File(filePath));
int inByte;
while((inByte = is.read()) != -1)
fos.write(inByte);
is.close();
fos.close();
我很确定我发布的数据是正确的,但我不确定如何获取 excel 文件。有人可以提供帮助吗?
编辑 我可以下载文件,但不是 excel 文件。这是一个网页,我认为它有一点改进。 (之前,没有下载任何东西,它只是挂在那里)问题是,我想我需要发送一个授权密钥或一个带有这个获取请求的 cookie 来下载文件。
编辑 2 我发现如果我只是在登录后粘贴到新选项卡中的 http://website.com/uri?ActSts=Edit&task=filter&Field=Plant,之后稍等片刻,我得到 excel 文件的 link。所以最初我认为只要使用相同的 httpclient,HTTPClient 就会始终保持相同的 cookie,但显然它不会(?)我想我必须想办法获取 cookie 并发送它。
天哪,我终于找到了有用的东西。行。因此,显然 HTTPClient 在开始出错之前只能处理 2 个响应,根据此处:
因此,我更改了我的代码以仅获取登录响应,然后获取 excel 文件作为响应,然后退出。我还添加了一些超时配置,还更改了先导出文件然后再使用实体的顺序。我使用了单独的第二响应和第二实体。这似乎也有点帮助?我猜的。
import java.util.List;
import java.util.ArrayList;
import org.apache.http.*;
import java.io.*;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.CookieStore;
import org.apache.http.client.config.CookieSpecs;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.*;
import org.apache.http.client.protocol.HttpClientContext;
import org.apache.http.conn.ConnectionPoolTimeoutException;
import org.apache.http.cookie.Cookie;
import org.apache.http.impl.client.*;
import org.apache.http.message.*;
import org.apache.http.util.EntityUtils;
import org.apache.http.client.entity.*;
public class hcFeb {
public static void main (String[] args) throws ClientProtocolException, IOException {
//Set up Cookie settings and also Timeout settings
CookieStore cookieStore = new BasicCookieStore();
HttpClientContext context = HttpClientContext.create();
context.setCookieStore(cookieStore);
int CONNECTION_TIMEOUT = 80000;
RequestConfig requestConfig = RequestConfig.custom().setCookieSpec(CookieSpecs.DEFAULT)
.setConnectionRequestTimeout(CONNECTION_TIMEOUT)
.setConnectTimeout(CONNECTION_TIMEOUT)
.setSocketTimeout(CONNECTION_TIMEOUT)
.build();
//Set up HttpClient
CloseableHttpClient httpclient = HttpClients.custom().setDefaultRequestConfig(requestConfig).setDefaultCookieStore(cookieStore).disableContentCompression().build();
HttpGet httpGet = new HttpGet("http://website");
CloseableHttpResponse response = httpclient.execute(httpGet);
//Create Post request to log into the website
HttpPost httpPost = new HttpPost("http://loginwebsite");
//Login to website
List <NameValuePair> nvps = new ArrayList <NameValuePair>();
nvps.add(new BasicNameValuePair("user","username"));
nvps.add(new BasicNameValuePair("password","password"));
nvps.add(new BasicNameValuePair("button", "Login"));
nvps.add(new BasicNameValuePair("task", "extlogin"));
httpPost.setEntity(new UrlEncodedFormEntity(nvps));
response = httpclient.execute(httpPost);
try{
System.out.println(response.getStatusLine());
HttpEntity entity = response.getEntity();
EntityUtils.consume(entity);
} finally{
}
//Send request for Excel file and download it.
String link = "http://website.com/uri?ActSts=Edit&task=filter&Field=Plant";
HttpGet get = new HttpGet(link);
//maybe create new response
HttpResponse response2;
try{
response2 = httpclient.execute(get,context);
System.out.println(response2.getStatusLine());
HttpEntity entity1 = response2.getEntity();
if (entity1 != null) {
System.out.println("Entity isn't null");
InputStream is = entity1.getContent();
String filePath = "C:\Users\windowsUserName\Downloads\WODETAIL_List.xls";
FileOutputStream fos = new FileOutputStream(new File(filePath));
byte[] buffer = new byte[5600];
int inByte;
while((inByte = is.read(buffer)) > 0)
fos.write(buffer,0,inByte);
is.close();
fos.close();
System.out.println("Excel File recieved");
EntityUtils.toString(response2.getEntity());
EntityUtils.consume(entity1);
}
} catch (ConnectionPoolTimeoutException e){
//response.close();
System.out.println(e.getMessage());
} catch (IOException e){
System.out.println(e.getMessage());
}
}
}
我正在尝试编写一个 java 程序来登录网站,输入搜索引擎,获取结果,然后下载从结果生成的 excel 文件.到目前为止,我可以正常登录。并发送搜索并获得结果。但是,我在下载 excel 文件时遇到了很多问题。
查看网站的源代码,我在 excel 文件周围看到 Ajax 和 Javascript,所以我假设是 ajax 帮助生成了它。
<input id="toexcel" type="image" src="/websmart/v9.4/XLGP/images/Excel-icon.png" alt="To Excel" title="To Excel: Max 20000 Records" onclick="" />
JavaScript部分:
$( document ).ready(function() {
$('#toexcel').click(function(e) {
e.preventDefault();
setTask('toexcel');
var ajaxForm = $("#filter-form");
$(".spinner").show();
var dataToSend = ajaxForm.serialize();
$("#excelFrame").attr('src','V7BAE01R.pgm' + '?' + dataToSend);
setTimeout(function() {
$(".spinner").hide();
}, 5000 );
使用 TamperData,当我单击 Excel 文件导出时,它会发送一个 post 请求(我设法在代码的最后一部分发送)但我不确定在哪里为拿到它,为实现它。我确实在篡改数据中看到了 Application/vnd.ms-excel
的 Get我不确定要在代码中添加什么来获取 excel 文件。下面,我尝试使用 BufferReader,但它没有获取我的文件。由于名称值对,我简化了一些代码。
import java.util.List;
import java.util.ArrayList;
import org.apache.http.*;
import java.io.*;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.*;
import org.apache.http.impl.client.*;
import org.apache.http.message.*;
import org.apache.http.util.EntityUtils;
import org.apache.http.client.entity.*;
public class httpClientTest {
public static void main (String[] args) throws ClientProtocolException, IOException {
//Set up HttpClient
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet("http://website");
CloseableHttpResponse response = httpclient.execute(httpGet);
//Create Post request to log into the AS400 website
HttpPost httpPost = new HttpPost("http://loginwebsite");
List <NameValuePair> nvps = new ArrayList <NameValuePair>();
nvps.add(new BasicNameValuePair("user","username"));
nvps.add(new BasicNameValuePair("password","password"));
nvps.add(new BasicNameValuePair("button", "Login"));
nvps.add(new BasicNameValuePair("task", "extlogin"));
httpPost.setEntity(new UrlEncodedFormEntity(nvps));
response = httpclient.execute(httpPost);
//Get Post response to ensure we logged in, which succeeds
try{
System.out.println(response.getStatusLine());
HttpEntity entity = response.getEntity();
EntityUtils.consume(entity);
} finally{
response.close();
}
//Sent a Post request to filters out recoreds.
httpPost = new HttpPost("http://searchresults");
nvps.clear();
nvps.add(new BasicNameValuePair("ActSts", "Edit"));
nvps.add(new BasicNameValuePair("task", "filter"));
nvps.add(new BasicNameValuePair("Field", "Plant"));
response = httpclient.execute(httpPost);
//Displays in printline the html/js of the page. This looks like it DOES display the search results
//So it IS sending the Post request and receiving a response.
BufferedReader rd = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
String line = "";
while ((line = rd.readLine()) != null) {
System.out.println(line);
}
//try to buffer to read in.
String link = "http://website.com/uri?ActSts=Edit&task=filter&Field=Plant";
HttpGet get = new HttpGet(link);
response = httpclient.execute(get);
InputStream is = response.getEntity().getContent();
String filePath = "C:\Users\WindowsUserName\Downloads\WODETAIL_List.xls";
FileOutputStream fos = new FileOutputStream(new File(filePath));
int inByte;
while((inByte = is.read()) != -1)
fos.write(inByte);
is.close();
fos.close();
我很确定我发布的数据是正确的,但我不确定如何获取 excel 文件。有人可以提供帮助吗?
编辑 我可以下载文件,但不是 excel 文件。这是一个网页,我认为它有一点改进。 (之前,没有下载任何东西,它只是挂在那里)问题是,我想我需要发送一个授权密钥或一个带有这个获取请求的 cookie 来下载文件。
编辑 2 我发现如果我只是在登录后粘贴到新选项卡中的 http://website.com/uri?ActSts=Edit&task=filter&Field=Plant,之后稍等片刻,我得到 excel 文件的 link。所以最初我认为只要使用相同的 httpclient,HTTPClient 就会始终保持相同的 cookie,但显然它不会(?)我想我必须想办法获取 cookie 并发送它。
天哪,我终于找到了有用的东西。行。因此,显然 HTTPClient 在开始出错之前只能处理 2 个响应,根据此处:
因此,我更改了我的代码以仅获取登录响应,然后获取 excel 文件作为响应,然后退出。我还添加了一些超时配置,还更改了先导出文件然后再使用实体的顺序。我使用了单独的第二响应和第二实体。这似乎也有点帮助?我猜的。
import java.util.List;
import java.util.ArrayList;
import org.apache.http.*;
import java.io.*;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.CookieStore;
import org.apache.http.client.config.CookieSpecs;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.*;
import org.apache.http.client.protocol.HttpClientContext;
import org.apache.http.conn.ConnectionPoolTimeoutException;
import org.apache.http.cookie.Cookie;
import org.apache.http.impl.client.*;
import org.apache.http.message.*;
import org.apache.http.util.EntityUtils;
import org.apache.http.client.entity.*;
public class hcFeb {
public static void main (String[] args) throws ClientProtocolException, IOException {
//Set up Cookie settings and also Timeout settings
CookieStore cookieStore = new BasicCookieStore();
HttpClientContext context = HttpClientContext.create();
context.setCookieStore(cookieStore);
int CONNECTION_TIMEOUT = 80000;
RequestConfig requestConfig = RequestConfig.custom().setCookieSpec(CookieSpecs.DEFAULT)
.setConnectionRequestTimeout(CONNECTION_TIMEOUT)
.setConnectTimeout(CONNECTION_TIMEOUT)
.setSocketTimeout(CONNECTION_TIMEOUT)
.build();
//Set up HttpClient
CloseableHttpClient httpclient = HttpClients.custom().setDefaultRequestConfig(requestConfig).setDefaultCookieStore(cookieStore).disableContentCompression().build();
HttpGet httpGet = new HttpGet("http://website");
CloseableHttpResponse response = httpclient.execute(httpGet);
//Create Post request to log into the website
HttpPost httpPost = new HttpPost("http://loginwebsite");
//Login to website
List <NameValuePair> nvps = new ArrayList <NameValuePair>();
nvps.add(new BasicNameValuePair("user","username"));
nvps.add(new BasicNameValuePair("password","password"));
nvps.add(new BasicNameValuePair("button", "Login"));
nvps.add(new BasicNameValuePair("task", "extlogin"));
httpPost.setEntity(new UrlEncodedFormEntity(nvps));
response = httpclient.execute(httpPost);
try{
System.out.println(response.getStatusLine());
HttpEntity entity = response.getEntity();
EntityUtils.consume(entity);
} finally{
}
//Send request for Excel file and download it.
String link = "http://website.com/uri?ActSts=Edit&task=filter&Field=Plant";
HttpGet get = new HttpGet(link);
//maybe create new response
HttpResponse response2;
try{
response2 = httpclient.execute(get,context);
System.out.println(response2.getStatusLine());
HttpEntity entity1 = response2.getEntity();
if (entity1 != null) {
System.out.println("Entity isn't null");
InputStream is = entity1.getContent();
String filePath = "C:\Users\windowsUserName\Downloads\WODETAIL_List.xls";
FileOutputStream fos = new FileOutputStream(new File(filePath));
byte[] buffer = new byte[5600];
int inByte;
while((inByte = is.read(buffer)) > 0)
fos.write(buffer,0,inByte);
is.close();
fos.close();
System.out.println("Excel File recieved");
EntityUtils.toString(response2.getEntity());
EntityUtils.consume(entity1);
}
} catch (ConnectionPoolTimeoutException e){
//response.close();
System.out.println(e.getMessage());
} catch (IOException e){
System.out.println(e.getMessage());
}
}
}