从 link 到 select href 值的 xpath 表达式
xpath expression to select href value from link
我有这样的 HTML 代码
<a class="cat" href="/Home/txtdata0/">txtdata0</a>
<a class="cat" href="/Home/txtdata1/">txtdata1</a>
<a class="cat" href="/Home/txtdata2/">txtdata2</a>
<a class="cat" href="/Home/txtdata3/">txtdata3</a>
为了访问 link 的所有文本,我使用了这样的 XPATH(如 C# 中的 Visual Studio)
.//a[@class=\"cat\"]
为了访问 link 的所有 href 值,我使用了这样的 XPATH(如 C# 中的 Visual Studio)
.//a[@class=\"cat\"]/@href
Google Chrome Xpath Helper show (.//a[@class="cat"] and .//a[@class="cat"]/@href)两个结果都正确
txtdata0
txtdata1
txtdata2
txtdata3
和
/Home/txtdata0/
/Home/txtdata1/
/Home/txtdata2/
/Home/txtdata3/
Visual Studio 这样的 Xpath .//a[@class=\"cat\"] show:
txtdata0
txtdata1
txtdata2
txtdata3
并使用这样的 Xpath .//a[@class=\"cat\"]/@href show:
txtdata0
txtdata1
txtdata2
txtdata3
为什么第二个输出和第一个输出一样?
程序代码
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(seturl);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
data = readStream.ReadToEnd();
response.Close();
readStream.Close();
}
doc.LoadHtml(data);
HtmlAgilityPack.HtmlNodeCollection bodynode = doc.DocumentNode.SelectNodes(".//a[@class=\"cat\"]");
HtmlAgilityPack.HtmlNodeCollection bodynod = doc.DocumentNode.SelectNodes(".//a[@class=\"cat\"]/@href");
MessageBox.Show(bodynode.Count.ToString());
MessageBox.Show(bodynod.Count.ToString());
for (int i = 0; i < bodynode.Count; i++)
{
MessageBox.Show(bodynode[i].InnerText.ToString() + " - " + bodynod[i].InnerText.ToString());
}
如果我没记错的话,HAP属性可以这样提取
string _tmpUrl = documentUrl.DocumentNode.SelectNodes("//a[@class='cat']")[i].Attributes["href"].Value;
我有这样的 HTML 代码
<a class="cat" href="/Home/txtdata0/">txtdata0</a>
<a class="cat" href="/Home/txtdata1/">txtdata1</a>
<a class="cat" href="/Home/txtdata2/">txtdata2</a>
<a class="cat" href="/Home/txtdata3/">txtdata3</a>
为了访问 link 的所有文本,我使用了这样的 XPATH(如 C# 中的 Visual Studio)
.//a[@class=\"cat\"]
为了访问 link 的所有 href 值,我使用了这样的 XPATH(如 C# 中的 Visual Studio)
.//a[@class=\"cat\"]/@href
Google Chrome Xpath Helper show (.//a[@class="cat"] and .//a[@class="cat"]/@href)两个结果都正确
txtdata0
txtdata1
txtdata2
txtdata3
和
/Home/txtdata0/
/Home/txtdata1/
/Home/txtdata2/
/Home/txtdata3/
Visual Studio 这样的 Xpath .//a[@class=\"cat\"] show:
txtdata0
txtdata1
txtdata2
txtdata3
并使用这样的 Xpath .//a[@class=\"cat\"]/@href show:
txtdata0
txtdata1
txtdata2
txtdata3
为什么第二个输出和第一个输出一样?
程序代码
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(seturl);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
data = readStream.ReadToEnd();
response.Close();
readStream.Close();
}
doc.LoadHtml(data);
HtmlAgilityPack.HtmlNodeCollection bodynode = doc.DocumentNode.SelectNodes(".//a[@class=\"cat\"]");
HtmlAgilityPack.HtmlNodeCollection bodynod = doc.DocumentNode.SelectNodes(".//a[@class=\"cat\"]/@href");
MessageBox.Show(bodynode.Count.ToString());
MessageBox.Show(bodynod.Count.ToString());
for (int i = 0; i < bodynode.Count; i++)
{
MessageBox.Show(bodynode[i].InnerText.ToString() + " - " + bodynod[i].InnerText.ToString());
}
如果我没记错的话,HAP属性可以这样提取
string _tmpUrl = documentUrl.DocumentNode.SelectNodes("//a[@class='cat']")[i].Attributes["href"].Value;