解析文本并使用其词根获取给定单词的出现次数

Question

我正在开发一个系统，该系统可以根据上下文给出给定单词（显然是多义词）的确切含义。这个研究领域称为词义消歧。为此，我需要给定单词的词根（steam）和包含它的文本。我将解析文本并使用其词根查找给定单词的所有出现。

例如，如果给定的单词是 "love"。系统将解析文本和 returns 所有出现的 "love"，例如 "lovely, loved, beloved..."

以下是我尝试过的但不幸的是我没有得到我想要的！

public class Partenn1 {

    public static void main(String[] args) {
        int c=0;
        String w = "tissue";

        try (BufferedReader br = new BufferedReader(new FileReader("D:/Sc46.txt")))
        {
            String line;
            while ((line = br.readLine()) != null)
            {
                String[] WrdsLine = line.split(" ");

                boolean findwrd = false;
                for( String WrdLine : WrdsLine )
                {
                    for (int a=0; a<WrdsLine.length; a++)
                    {
                        if ( WrdsLine[a].indexOf(w)!=0)
                        {
                            c++; //It's just a counter for verification of the numbre of the occ.
                            findwrd = true; 
                        }
                    }

                }
            }
            System.out.println(c);
        }
        catch (IOException e) {}
    }
}

Answer 1

单词的词根也称为单词的前缀。这可以通过在具有相应前缀的字符串上调用方法 startsWith 来实现。

以下代码正确打印出“2”，因为 'tissue2' 和 'tissue3' 都以 'tissue'.

开头

int count = 0;
final String prefix = "tissue";

try (BufferedReader br = new BufferedReader(new StringReader("tissue2 tiss tiss3 tissue3"))) {
    String line;
    while ((line = br.readLine()) != null) {
        // Get all the words on this line
        final String[] wordsInLine = line.split(" ");

        for (final String s : wordsInLine) {
            // Check that the word starts with the prefix.
            if (s.startsWith(prefix)) {
                count++;
            }
        }

    }
    System.out.println(count);
} catch (final IOException ignored) {
}

Answer 2

不需要再 for 循环。 w 是这里需要的字符串：

while ((line = br.readLine()) != null) {
                String[] WrdsLine = line.split(" "); // split

                for( String WrdLine : WrdsLine ) {

                    if ( WrdLine.contains(w)) { // if match - print
                        System.out.println(WrdLine);
                    }

                }
            }

解析文本并使用其词根获取给定单词的出现次数

Parsing a text and get the occurences of a given word using its root

java

parsing

text-files