如何在没有 jsoup 的情况下仅在 java 中使用正则表达式解析 HTML TAG

Question

大家好，我只需要用正则表达式解析 HTML 标签，剩下的是非 html 标签，没有 jsoup

例如

<h1> i love india <\h1>
<xyz> name <\xyz>
<html> hey i won! <\html>
<syd> like it <\syd>
<<<<<<
<br> love you <br>  
>>>>>>>>

预期输出为：

i love india
none
hey i won!
none
none
love you
none

我试了很多但没有得到确切的答案任何人都可以帮助我解决这个问题。提前致谢。

Answer 1

使用正则表达式删除所有标签：

s.replaceAll("<[^>]*>", "");

Answer 2

尝试以下操作：

        String[] array = { "<h1> i love india <\h1>",
                           "<xyz> name <\xyz>",
                           "<html> hey i won! <\html>",
                           "<syd> like i`enter code here`t <\syd>"
                        };
    Pattern pattern = Pattern.compile(">((.[^><]+))<");
    for (String str : array ) {
        Matcher m = pattern.matcher(str);
        if(m.find()) 
          System.out.println(m.group(1));
        else
          System.out.println("none");
    }

如何在没有 jsoup 的情况下仅在 java 中使用正则表达式解析 HTML TAG

how to parse only HTML TAG with regex in java without jsoup

html

java

parsing

html-parsing