无法使用c#获取网站内容
Can't get content of website with c#
这是我获取网站内容的代码行:
private string GetContent(string url) {
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
var content = String.Empty;
HttpStatusCode statusCode;
using (var response = request.GetResponse())
using (var stream = response.GetResponseStream())
{
var contentType = response.ContentType;
Encoding encoding = null;
if (contentType != null)
{
var match = Regex.Match(contentType, @"(?<=charset\=).*");
if (match.Success)
encoding = Encoding.GetEncoding(match.ToString());
}
encoding = encoding ?? Encoding.UTF8;
statusCode = ((HttpWebResponse)response).StatusCode;
using (var reader = new StreamReader(stream, encoding))
content = reader.ReadToEnd();
}
return content;
}
我已经尝试 运行 这行代码 link: http://google.com. And It's done. But when I runs with link: http://batdongsan.com.vn/。它不起作用并显示 "sorry! something went wrong."。我不知道为什么会这样。如何获取第二个 link?
的内容
该站点似乎正在检查 User-Agent header,并且由于默认情况下未设置它,因此返回错误消息。我添加了我的浏览器发送的内容,并能够获取 link 的内容。只需添加设置 UserAgent 的行,如下所示:
// ...
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36";
var content = String.Empty;
HttpStatusCode statusCode;
// ...
这是我获取网站内容的代码行:
private string GetContent(string url) {
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
var content = String.Empty;
HttpStatusCode statusCode;
using (var response = request.GetResponse())
using (var stream = response.GetResponseStream())
{
var contentType = response.ContentType;
Encoding encoding = null;
if (contentType != null)
{
var match = Regex.Match(contentType, @"(?<=charset\=).*");
if (match.Success)
encoding = Encoding.GetEncoding(match.ToString());
}
encoding = encoding ?? Encoding.UTF8;
statusCode = ((HttpWebResponse)response).StatusCode;
using (var reader = new StreamReader(stream, encoding))
content = reader.ReadToEnd();
}
return content;
}
我已经尝试 运行 这行代码 link: http://google.com. And It's done. But when I runs with link: http://batdongsan.com.vn/。它不起作用并显示 "sorry! something went wrong."。我不知道为什么会这样。如何获取第二个 link?
的内容该站点似乎正在检查 User-Agent header,并且由于默认情况下未设置它,因此返回错误消息。我添加了我的浏览器发送的内容,并能够获取 link 的内容。只需添加设置 UserAgent 的行,如下所示:
// ...
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36";
var content = String.Empty;
HttpStatusCode statusCode;
// ...