使用 HTML Agility Pack 在 html doc c# 中查找特定的 link

Question

我正在尝试解析 HTML 文档以检索页面中的特定 link。我知道这可能不是最好的方法，但我试图通过其内部文本找到我需要的 HTML 节点。但是，在 HTML 中有两个实例会发生这种情况：页脚和导航栏。我需要导航栏中的 link。 HTML 中的 "footer" 排在第一位。这是我的代码：

    public string findCollegeURL(string catalog, string college)
    {
        //Find college
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(catalog);
        var root = doc.DocumentNode;
        var htmlNodes = root.DescendantsAndSelf();

        // Search through fetched html nodes for relevant information
        int counter = 0;
        foreach (HtmlNode node in htmlNodes) {
            string linkName = node.InnerText;
            if (linkName == colleges[college] && counter == 0)
            {
                counter++;
                continue;
            }  
            else if(linkName == colleges[college] && counter == 1)
            {
                string targetURL = node.Attributes["href"].Value; //"found it!"; //
                return targetURL;
            }/* */
        }

        return "DID NOT WORK";
    }

程序正在进入 if else 语句，但在尝试检索 link 时，我收到 NullReferenceException。 这是为什么？如何检索我需要的 link？

这是我试图访问的 HTML 文档中的代码：

    <tr class>
       <td id="acalog-navigation">
           <div class="n2_links" id="gateway-nav-current">...</div>
           <div class="n2_links">...</div>
           <div class="n2_links">...</div>
           <div class="n2_links">...</div>
           <div class="n2_links">...</div>
              <a href="/content.php?catoid=10&navoid=1210" class"navbar" tabindex="119">College of Science</a> ==[=14=]
           </div>

这是我想要的link: /content.php?catoid=10&navoid=1210

Answer 1

我发现使用 XPath 比编写大量代码更容易使用

var link = doc.DocumentNode.SelectSingleNode("//a[text()='College of Science']")
              .Attributes["href"].Value;

如果您有 2 个链接具有相同的文本，select 第二个

var link = doc.DocumentNode.SelectSingleNode("(//a[text()='College of Science'])[2]")
              .Attributes["href"].Value;

它的 Linq 版本

var links = doc.DocumentNode.Descendants("a")
               .Where(a => a.InnerText == "College of Science")
               .Select(a => a.Attributes["href"].Value)
               .ToList();

使用 HTML Agility Pack 在 html doc c# 中查找特定的 link

Find specific link in html doc c# using HTML Agility Pack

html

c#

html-parsing

html-agility-pack