Java StreamTokenizer 将没有空格的数字和字符作为单独的标记

Java StreamTokenizer taking a number and character without whitespace as separate tokens

我正在用 StreamTokenizer 编写解析器。我需要像 "8a" 这样的输入来回显数字包含字符的错误。相反,它打印:

NUM: 8 ID: a

它似乎将 char 标识为一个单独的标记,即使它们之间没有空格分隔。

有解决办法吗?

您可以识别当前标记是否为 StreamTokenizer.TT_WORD 并输出错误。检查下面的代码片段,它接受一个包含数字和不带空格的字符的文本,并在到达一个字符时输出错误。

import java.io.*;
public class StreamCharacterChecker{

     public static void main(String []args) throws IOException{
        String text = "123458a787";
        Reader r = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(text.getBytes())));
        StreamTokenizer st = new StreamTokenizer(r);
        int token;
        while ((token = st.nextToken()) != StreamTokenizer.TT_EOF){
            if (token == StreamTokenizer.TT_WORD){
                System.out.println("Error characters detected!");
                break;
            }    
        }
     }
}

您可以覆盖 StringTokenizerparseNumbers 方法来禁用对数字字符的特殊处理。 请注意,这可能非常危险,否则不合适。

根据 javadoc https://docs.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.html#parseNumbers():

 * When the parser encounters a word token that has the format of a
 * double precision floating-point number, it treats the token as a
 * number rather than a word, by setting the {@code ttype}
 * field to the value {@code TT_NUMBER} and putting the numeric
 * value of the token into the {@code nval} field.

这是示例 - 我没有向数字中使用的典型字符添加 'numeric' 属性:

    final Reader rd = new StringReader("8a");
    final StreamTokenizer tk = new StreamTokenizer(rd) {
        @Override
        public void parseNumbers() {
            // super.parseNumbers(); - by not calling super. I disable special handling of numeric characters
        }
    };

    tk.wordChars('a', 'z');
    tk.wordChars('0', '9');
    while ((tk.nextToken()) != StreamTokenizer.TT_EOF) {
        if (tk.ttype == StreamTokenizer.TT_WORD) {
            System.out.println("TT_WORD " + tk.sval);
        }
        if (tk.ttype == StreamTokenizer.TT_NUMBER) {
            System.out.println("TT_NUMBER " + tk.nval);
        }
    }

输出:

TT_WORD 8a

通过上面的配置,你可以得到一个 String 8a 然后 String.contains 检查里面是否有数字。