如何获取 href 标签外的 link？

Question

private static final Pattern ptninhref =Pattern.compile(
            "(?:.*\<[aA][^\>]*(?i)href(?-i)=\\"[^\\"]*)([^\\"]*)");

    public static List<String> captureValuesinhref(String largeText){
        Matcher mtchinhref = ptninhref.matcher(largeText);
        List<String> inHREF = new ArrayList<>();
        while(mtchinhref.find()){
           inHREF.add(mtchinhref.group());

如何只获取"link is given "？
当我使用 (?:.*\<[aA][^\>]*(?i)href(?-i)=\"[^\"]*)([^\"]*)(?:[^\"]*\".*\</[aA]\>.*) 这个正则表达式代码时，它给我的输出如下：<a href="link is given here">link is given here</a>。

但只需要输出："link is given here"
我需要 href 标签外的 link 。

有两个 link :
1 在 href 标签内。
2 在浏览器中显示的 href 标签之外。
我只需要第二个 link。
如何在 netbeans 中使用 java 获取它？

Answer 1

public class RegexExample {

    /**
     * @param args
     */
    public static void main(String[] args) {

        String href= "<a href=\"w3schools.com\">Visit W3Schools.com!</a>";
        String regexOr = "(?<=[>])(\\?.)*?(?=[<])";
        Pattern pattern = Pattern.compile(regexOr);
        Matcher matcher = pattern.matcher(href);
        if (matcher.find()) {
            String enrichedValue = matcher.group();
            System.out.print(enrichedValue);
        }
    }
}

这将打印：

Visit W3Schools.com!

注意\在java中变成\\，需要转义

完整示例：

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexExample {
    private static final Pattern ptninhref;
    static{
        ptninhref = Pattern.compile("(?<=[>])(\\?.)*?(?=[<])");
    }

    /**
     * @param args
     */
    public static void main(String[] args) {
        String href= "<a href=\"paypal.com/signin/\">https://www.paypa1.com/signin/</a>";
        List<String> results = captureValuesinhref(href);
        for(String result:results){
            System.out.print(result);
        }
    }

    public static List<String> captureValuesinhref(String largeText){
        Matcher mtchinhref = ptninhref.matcher(largeText);
        List<String> inHREF = new ArrayList<String>();
        while(mtchinhref.find()){
           inHREF.add(mtchinhref.group());
        }
        return inHREF;
    }
}

打印：

https://www.paypa1.com/signin/

如何获取 href 标签外的 link？

How to fetch the link that is outside the href tag?

html

java

netbeans

web-scraping