计算文章 "a"、"an" 在文本文件中的使用次数

Counting number of time the articles "a","an" are being used in a text file

我正在尝试制作一个程序来计算字数、行数、句子数以及文章数 'a'、'and'、'the'。 到目前为止,我得到了单词、行、句子。但是我不知道我要统计谁的文章。程序如何区分 'a' 和 'and'.

到目前为止这是我的代码。

 public static void main(String[]args) throws FileNotFoundException, IOException        
    {       
FileInputStream file= new FileInputStream("C:\Users\nlstudent\Downloads\text.txt");
Scanner sfile = new Scanner(new File("C:\Users\nlstudent\Downloads\text.txt"));

  int ch,sentence=0,words = 0,chars = 0,lines = 0; 

  while((ch=file.read())!=-1)
  {
   if(ch=='?'||ch=='!'|| ch=='.')
    sentence++;
  }

    while(sfile.hasNextLine())  {
        lines++;
    String line = sfile.nextLine();
        chars += line.length();
        words += new StringTokenizer(line, " ,").countTokens();
    }


System.out.println("Number of words: " + words);
System.out.println("Number of sentence: " + sentence);
System.out.println("Number of lines: " + lines);
System.out.println("Number of characters: " + chars);
}
}

标记器会将每一行拆分为标记。您可以评估每个标记(一个完整的单词)以查看它是否与您期望的字符串匹配。这是一个计算 a, and, the.

的例子
int a = 0, and = 0, the = 0, forCount = 0;

while (sfile.hasNextLine()) {
    lines++;
    String line = sfile.nextLine();
    chars += line.length();
    StringTokenizer tokenizer = new StringTokenizer(line, " ,");
    words += tokenizer.countTokens();

    while (tokenizer.hasMoreTokens()) {
        String element = (String) tokenizer.nextElement();

        if ("a".equals(element)) {
            a++;
        } else if ("and".equals(element)) {
            and++;
        } else if ("for".equals(element)) {
            forCount++;
        } else if ("the".equals(element)) {
            the++;
        }
    }
}

How can a program make the difference between 'a' and 'and'.

您可以为此使用正则表达式:

        String input = "A and Andy then the are a";
        Matcher m = Pattern.compile("(?i)\b((a)|(an)|(and)|(the))\b").matcher(input);
        int count = 0;
        while(m.find()){
            count++;
        }
        //count == 4

'\b'是一个字边界,'|'是 OR, '(?i)' — 忽略大小写 标志。您可以找到所有模式列表 here 并且您可能应该了解正则表达式。