WebClient 挂起直到超时
WebClient hangs until timeout
我尝试使用 WebClient 下载网页,但它一直挂起,直到达到 WebClient 中的超时,然后失败并出现异常。
以下代码将不起作用
WebClient client = new WebClient();
string url = "https://www.nasdaq.com/de/symbol/aapl/dividend-history";
string page = client.DownloadString(url);
使用不同的 URL,传输工作正常。例如
WebClient client = new WebClient();
string url = "https://www.ariva.de/apple-aktie";
string page = client.DownloadString(url);
完成速度非常快,页面变量中包含整个 html。
使用 HttpClient 或 WebRequest/WebResponse 在第一个 URL 上给出相同的结果:阻塞直到超时异常。
两个 URL 都可以在浏览器中正常加载,大约需要 2-5 秒。
知道问题出在哪里吗?有什么解决方案?
我注意到在 Windows 窗体对话框上使用 WebBrowser 控件时,第一个 URL 加载有 20+ javascript 个错误需要确认点击。当访问第一个 URL.
时在浏览器中打开开发人员工具时可以观察到同样的情况
但是,WebClient 不会对它获取的 return 进行操作。它不会 运行 javascript,也不会加载参考图片、css 或其他脚本,所以这应该不是问题。
谢谢!
拉尔夫
显然下载 link 有问题(不正确 url,未经授权的访问,...),但是您可以使用 Async Method 来解决 socking 部分:
WebClient client = new WebClient();
client.DownloadStringCompleted += (s, e) =>
{
//here deal with downloaded file
};
client.DownloadStringAsync(url);
第一个站点 "https://www.nasdaq.com/de/symbol/aapl/dividend-history";
需要:
- ServicePointManager.SecurityProtocol
= SecurityProtocolType.Tls12
- ServicePointManager.ServerCertificateValidationCallback
- 一组User-Agent Header
- A CookieContainer 显然不是必需的。无论如何都应该设置它。
这里的User-agent
很重要。如果在WebRequest.UserAgent, the WebSite may activate the Http 2.0
protocol and HSTS
(HTTP Strict Transport Security) 中指定最近的User-agent
。这些 supported/understood 仅适用于最近的浏览器(作为参考,FireFox 56 或更新版本)。
使用较新的浏览器,因为 User-agent
是必要的,否则网站将期待(并等待)动态 响应。使用 older User-agent
,该网站将激活 Http 1.1
协议并且从不激活 HSTS。
第二个站点 "https://www.ariva.de/apple-aktie";
需要:
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
- 不需要服务器证书验证
- 不需要特定的用户代理
我建议以这种方式设置 WebRequest(或相应的 HttpClient 设置):
(WebClient 可以 工作,但它可能需要派生的自定义控件)
private async void button1_Click(object sender, EventArgs e)
{
button1.Enabled = false;
Uri uri = new Uri("https://www.nasdaq.com/de/symbol/aapl/dividend-history");
string destinationFile = "[Some Local File]";
await HTTPDownload(uri, destinationFile);
button1.Enabled = true;
}
CookieContainer httpCookieJar = new CookieContainer();
//The 32bit IE11 header is the User-agent used here
public async Task HTTPDownload(Uri resourceURI, string filePath)
{
// Windows 7 may require to explicitly set the Protocol
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
// Only blindly accept the Server certificates if you know and trust the source
ServicePointManager.ServerCertificateValidationCallback += (s, cert, ch, sec) => { return true; };
ServicePointManager.DefaultConnectionLimit = 50;
var httpRequest = WebRequest.CreateHttp(resourceURI);
try
{
httpRequest.CookieContainer = httpCookieJar;
httpRequest.Timeout = (int)TimeSpan.FromSeconds(15).TotalMilliseconds;
httpRequest.AllowAutoRedirect = true;
httpRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
httpRequest.ServicePoint.Expect100Continue = false;
httpRequest.UserAgent = "Mozilla / 5.0(Windows NT 6.1; WOW32; Trident / 7.0; rv: 11.0) like Gecko";
httpRequest.Accept = "ext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
httpRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate;q=0.8");
httpRequest.Headers.Add(HttpRequestHeader.CacheControl, "no-cache");
using (var httpResponse = (HttpWebResponse)await httpRequest.GetResponseAsync())
using (var responseStream = httpResponse.GetResponseStream())
{
if (httpResponse.StatusCode == HttpStatusCode.OK) {
try {
int buffersize = 132072;
using (var fileStream = File.Create(filePath, buffersize, FileOptions.Asynchronous)) {
int read;
byte[] buffer = new byte[buffersize];
while ((read = await responseStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
await fileStream.WriteAsync(buffer, 0, read);
}
};
}
catch (DirectoryNotFoundException) { /* Log or throw */}
catch (PathTooLongException) { /* Log or throw */}
catch (IOException) { /* Log or throw */}
}
};
}
catch (WebException) { /* Log and message */}
catch (Exception) { /* Log and message */}
}
第一个网站 (nasdaq.com
) 返回的负载长度为 101.562
字节
第二个网站 (www.ariva.de
) 返回的负载长度是 56.919
字节
我尝试使用 WebClient 下载网页,但它一直挂起,直到达到 WebClient 中的超时,然后失败并出现异常。
以下代码将不起作用
WebClient client = new WebClient();
string url = "https://www.nasdaq.com/de/symbol/aapl/dividend-history";
string page = client.DownloadString(url);
使用不同的 URL,传输工作正常。例如
WebClient client = new WebClient();
string url = "https://www.ariva.de/apple-aktie";
string page = client.DownloadString(url);
完成速度非常快,页面变量中包含整个 html。
使用 HttpClient 或 WebRequest/WebResponse 在第一个 URL 上给出相同的结果:阻塞直到超时异常。
两个 URL 都可以在浏览器中正常加载,大约需要 2-5 秒。 知道问题出在哪里吗?有什么解决方案?
我注意到在 Windows 窗体对话框上使用 WebBrowser 控件时,第一个 URL 加载有 20+ javascript 个错误需要确认点击。当访问第一个 URL.
时在浏览器中打开开发人员工具时可以观察到同样的情况但是,WebClient 不会对它获取的 return 进行操作。它不会 运行 javascript,也不会加载参考图片、css 或其他脚本,所以这应该不是问题。
谢谢!
拉尔夫
显然下载 link 有问题(不正确 url,未经授权的访问,...),但是您可以使用 Async Method 来解决 socking 部分:
WebClient client = new WebClient();
client.DownloadStringCompleted += (s, e) =>
{
//here deal with downloaded file
};
client.DownloadStringAsync(url);
第一个站点 "https://www.nasdaq.com/de/symbol/aapl/dividend-history";
需要:
- ServicePointManager.SecurityProtocol
= SecurityProtocolType.Tls12
- ServicePointManager.ServerCertificateValidationCallback
- 一组User-Agent Header
- A CookieContainer 显然不是必需的。无论如何都应该设置它。
这里的User-agent
很重要。如果在WebRequest.UserAgent, the WebSite may activate the Http 2.0
protocol and HSTS
(HTTP Strict Transport Security) 中指定最近的User-agent
。这些 supported/understood 仅适用于最近的浏览器(作为参考,FireFox 56 或更新版本)。
使用较新的浏览器,因为 User-agent
是必要的,否则网站将期待(并等待)动态 响应。使用 older User-agent
,该网站将激活 Http 1.1
协议并且从不激活 HSTS。
第二个站点 "https://www.ariva.de/apple-aktie";
需要:
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
- 不需要服务器证书验证
- 不需要特定的用户代理
我建议以这种方式设置 WebRequest(或相应的 HttpClient 设置):
(WebClient 可以 工作,但它可能需要派生的自定义控件)
private async void button1_Click(object sender, EventArgs e)
{
button1.Enabled = false;
Uri uri = new Uri("https://www.nasdaq.com/de/symbol/aapl/dividend-history");
string destinationFile = "[Some Local File]";
await HTTPDownload(uri, destinationFile);
button1.Enabled = true;
}
CookieContainer httpCookieJar = new CookieContainer();
//The 32bit IE11 header is the User-agent used here
public async Task HTTPDownload(Uri resourceURI, string filePath)
{
// Windows 7 may require to explicitly set the Protocol
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
// Only blindly accept the Server certificates if you know and trust the source
ServicePointManager.ServerCertificateValidationCallback += (s, cert, ch, sec) => { return true; };
ServicePointManager.DefaultConnectionLimit = 50;
var httpRequest = WebRequest.CreateHttp(resourceURI);
try
{
httpRequest.CookieContainer = httpCookieJar;
httpRequest.Timeout = (int)TimeSpan.FromSeconds(15).TotalMilliseconds;
httpRequest.AllowAutoRedirect = true;
httpRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
httpRequest.ServicePoint.Expect100Continue = false;
httpRequest.UserAgent = "Mozilla / 5.0(Windows NT 6.1; WOW32; Trident / 7.0; rv: 11.0) like Gecko";
httpRequest.Accept = "ext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
httpRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate;q=0.8");
httpRequest.Headers.Add(HttpRequestHeader.CacheControl, "no-cache");
using (var httpResponse = (HttpWebResponse)await httpRequest.GetResponseAsync())
using (var responseStream = httpResponse.GetResponseStream())
{
if (httpResponse.StatusCode == HttpStatusCode.OK) {
try {
int buffersize = 132072;
using (var fileStream = File.Create(filePath, buffersize, FileOptions.Asynchronous)) {
int read;
byte[] buffer = new byte[buffersize];
while ((read = await responseStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
await fileStream.WriteAsync(buffer, 0, read);
}
};
}
catch (DirectoryNotFoundException) { /* Log or throw */}
catch (PathTooLongException) { /* Log or throw */}
catch (IOException) { /* Log or throw */}
}
};
}
catch (WebException) { /* Log and message */}
catch (Exception) { /* Log and message */}
}
第一个网站 (nasdaq.com
) 返回的负载长度为 101.562
字节
第二个网站 (www.ariva.de
) 返回的负载长度是 56.919
字节