从字符串中提取以特定字符开头的单词

Question

我得到以下字符串：

 String line = "#food was testy. #drink lots of. #night was fab. #three #four";

我想从中提取#food#drink#night#three和#four

我试过这段代码：

    String[] words = line.split("#");
    for (String word: words) {
        System.out.println(word);
    }

但它给出 food was testy、drink lots of、nigth was fab、three 和 four。

Answer 1

split 只会在找到 # 的位置剪切整个字符串。这解释了您当前的结果。

您可能想要提取每段字符串的第一个单词，但是执行任务的好工具是 RegEx

这里是实现方法：

String line = "#food was testy. #drink lots of. #night was fab. #three #four";

Pattern pattern = Pattern.compile("#\w+");

Matcher matcher = pattern.matcher(line);
while (matcher.find())
{
    System.out.println(matcher.group());
}

输出为：

#food
#drink
#night
#three
#four

魔法发生在“#\w+”。

# 模式以 #
\w Matches any letter (a-z, A-Z), number (0-9), or underscore.
+ 匹配一个或多个连续的 \w 个字符。

因此我们搜索以 # 开头后跟一个或多个字母、数字或下划线的内容。

因为 Escape Sequences.

我们用 '\\' 代替 '\'

你可以玩它here。

find和group解释here:

find 方法扫描输入序列以查找与模式匹配的下一个子序列。
group() returns 与上一个匹配项匹配的输入子序列。

[编辑]

如果您需要检测重音字符或非拉丁字符，使用 \w 可能会出现问题。

例如：

"Bonjour mon #bébé #chat."

比赛将是：

#b
#聊天

这取决于您将尽可能接受什么hashTag。但这是另一个问题 multiple discussions exist about it.

例如，如果您想要来自任何语言的任何字符，#\p{L}+ 看起来不错，但下划线不在其中...

Answer 2

请按流程操作==>

   String candidate = "#food was testy. #drink lots of. #night was fab. #three #four";

        String regex = "#\w+";
        Pattern p = Pattern.compile(regex);

        Matcher m = p.matcher(candidate);
        String val = null;

        System.out.println("INPUT: " + candidate);

        System.out.println("REGEX: " + regex + "\r\n");

        while (m.find()) {
          val = m.group();
          System.out.println("MATCH: " + val);
        }
        if (val == null) {
          System.out.println("NO MATCHES: ");
        }

当我在我的 netbeans IDE 上解决了这个问题并测试了程序时，它将给出如下输出

INPUT: #food was testy. #drink lots of. #night was fab. #three #four

REGEX: #\w+

MATCH: #food

MATCH: #drink

MATCH: #night

MATCH: #three

MATCH: #four

您将需要以下导入

import java.util.regex.Matcher;
import java.util.regex.Pattern;

从字符串中提取以特定字符开头的单词

Extract words starting with a particular character from a string

java

string

extraction