以编程方式从文件打开网页并单击 link 以处理响应 C#

Question

场景是将邮件发送到收件箱。附加到邮件的是一个 html 文件，用户单击该文件可在浏览器中打开该页面。然后他们单击网页上的 link，在线打开 PDF 文件。

现在，我想用 C# 以编程方式实现的是将附加的 html 文件保存在磁盘上，打开该文件，找到 link，单击它并保存打开的文件磁盘。

我已经以编程方式打开电子邮件并将附加的 html 文件保存到磁盘。但现在我有点坚持以编程方式打开文件。我已经创建了一个 FileWebRequest 来打开文件，但我不知道如何找到 link（"a" 标记，仅在整个页面中）并以编程方式单击它（在c#) 这样 PDF 就打开了，所以我可以下载它并保存到磁盘。

filewebrequest之后需要做什么？

FileWebRequest req = (FileWebRequest)WebRequest.Create(pathToHtmlFile);
FileWebResponse res = (FileWebResponse)req.GetResponse();
// What now..?

Answer 1

首先，您应该使用 RegEx 从 html 内容中提取 PDF URL，然后使用 WebClient 下载它：

    private static string FindPdfFileDownloadLink(string htmlContent)
    {
        return Regex.Match(htmlContent, @"^(https?:\/\/)?www\.([\da-z\.-]+)\.([a-z\.]{2,6})\/[\w \.-]+?\.pdf$").Value;
    }

    public static int Main(string[] args)
    {
        string htmlContent = File.ReadAllText("1.html");
        string pdfUrl = FindPdfFileDownloadLink(htmlContent);

        using (WebClient wClient = new WebClient())
        {
            wClient.DownloadFile(pdfUrl, @"1.pdf");
        }

        Console.Read();
        return 0;
    }

如果您出于任何原因真的想点击 link，您可以在隐藏的网络浏览器中加载 html 并找到您想要的元素并点击它。

将内容加载到 WebBrowser 控件中：

webBrowser1.Navigate(@"1.html");

找到并点击元素：

HtmlElement link = webBrowser.Document.GetElementByID("link_id_58547")
link.InvokeMember("Click")

以编程方式从文件打开网页并单击 link 以处理响应 C#

Programmatically open webpage from file and click link to handle response C#

.net

html

c#

click

hyperlink