从可能包含或不包含符号的扫描仪令牌中获取整数

Question

我有一个函数将文本文件的扫描仪作为输入，我需要从每一行中提取整数值。这些行可能不遵循严格的语法。

我曾尝试使用 skip() 来忽略特定的非整数，但我担心我可能会将它用于它无法做到的事情。

我也尝试过将令牌转换为字符串并使用 replaceAll(";", "")，但这很快将我的代码变成了一堆 if 语句和 String 到 int 的转换。考虑到我有很多不同的变量需要在这里设置，它很快就变坏了。

有没有更优雅的方案？

这是我的输入文件：

pop 25; // my code must accept this
pop 25 ; // and also this
house 3.2, 1; // some lines will set multiple values
house 3.2 , 1 ; // so I will need to ignore both commas and semicolons

这是我的代码：

static int population = -1;
static double median = -1;
static double scatter = -1;

private static void readCommunity(Scanner sc) {
    while (sc.hasNext()) {
        String input = sc.next();
        if ("pop".equals(input)) {
            sc.skip(";*"); // my guess is this wouldn't work unless the
                           // token had a ';' BEFORE the integer
            if (sc.hasNextInt()) {
                population = sc.nextInt();
            } else { // throw an error. not important here }
            sc.nextLine();
        } else if ("house".equals(input)) {
            sc.skip(",*");
            if (sc.hasNextDouble()) {
                median = sc.nextDouble;
                sc.skip(";*");
                if (sc.hasNextDouble()) {
                    scatter = sc.nextDouble();
                } else { // error }
            } else { // error }
            sc.nextLine();
        }
    } 
}

Answer 1

正则表达式可能是比 nextInt 或 nextDouble 更好的选择。您可以使用

获取每个十进制值

Pattern p = Pattern.compile("\d+(\.\d+)?");
Matcher m = p.matcher(a);
while(m.find()) {
    System.out.println(m.group());
}

正则表达式检查给定字符串中出现的所有小数或 non-decimal 数字。

\d+ - 数字出现一次或多次

(\.\d+) - 后跟小数点和一位或多位数字

? - 括号中的表达式是可选的。因此，数字可能包含也可能不包含小数。

这将为您提供的数据打印以下内容

编辑：

解析行时逗号和 semi-colons 的问题可以通过使用 nextLine() 而不是 next() 获取整行来避免。 next() 一次只从输入中获取一个标记。使用 nextLine 和正则表达式，您可以读取单个数字，如下所示。

      while (sc.hasNext()) {
            Pattern p = Pattern.compile("\d+(\.\d+)?");
            Matcher m ;
            int population = -1;
            double median = -1;
            double scatter = -1;
            String input = sc.nextLine();   // fetches the entire line      
            if (input.contains("pop")) {                            
                m = p.matcher(input);
                while (m.find()) {
                    population = Integer.parseInt(m.group());
                }
            } else if (input.contains("house")) {
                m = p.matcher(input);
                m.find();
                median = Double.parseDouble(m.group());
                m.find();
                scatter = Double.parseDouble(m.group());
            }               
        }

Answer 2

在我看来，我认为读取整个文件数据行然后将该行拆分为我需要的内容，并对读取的数据值进行验证等更容易。例如：

private static void readCommunity(String dataFilePath) {
    File file = new File(dataFilePath);
    if (!file.exists()) {
        System.err.println("File Not Found! (" + dataFilePath + ")");
        return;
    }
    int lineCount = 0;   // For counting file lines.
    // 'Try With Resources' used here so as to auto-close reader.
    try (Scanner sc = new Scanner(file)) {
        while (sc.hasNextLine()) {
            String fileInput = sc.nextLine().trim();
            lineCount++;   // Increment line counter.
            // Skip blank lines (if any).
            if (fileInput.isEmpty()) {
                continue;
            }
            /* Remove comments from data line (if any). Your file 
               example shows comments at the end of each line. Yes, 
               I realize that your file most likely doesn't contain 
               these but it doesn't hurt to have this here in case 
               it does or if you want to have that option. Comments
               can start with // or /*. Comments must be at the end
               of a data line. This 'does not' support any Multi-line 
               comments. More code is needed for that.            */
            if (fileInput.contains("//") || fileInput.contains("/*")) {
                fileInput = fileInput.substring(0, fileInput.contains("//")
                        ? fileInput.indexOf("//") : fileInput.indexOf("/*"));
            }
            // Start parsing the data line into required parts...
            // Start with semicolon portions
            String[] lineMainParts = fileInput.split("\s{0,};\s{0,}");
            /* Iterate through all the main elemental parts on a 
               data line (if there is more than one), for example:
                  pop 30; house 4.3, 1; pop 32; house 3.3, 2   */
            for (int i = 0; i < lineMainParts.length; i++) {
                // Is it a 'pop' attribute?
                if (lineMainParts[i].toLowerCase().startsWith("pop")) {
                    //Yes it is... so validate, convert, and display the value.
                    String[] attributeParts = lineMainParts[i].split("\s+");
                    if (attributeParts[1].matches("-?\d+|\+?\d+")) {   // validate string numerical value (Integer).
                        population = Integer.valueOf(attributeParts[1]);  // convert to Integer
                        System.out.println("Population:\t" + population); // display...
                    }
                    else {
                        System.err.println("Invalid population value detected in file on line "
                                + lineCount + "! (" + lineMainParts[i] + ")");
                    }
                }
                // Is it a 'house' attribute?
                else if (lineMainParts[i].toLowerCase().startsWith("house")) {
                    /* Yes it is... so split all comma delimited attribute values
                   for 'house', validate each numerical value, convert each 
                   numerical value, and display each attribute and their 
                   respective values.  */
                    String[] attributeParts = lineMainParts[i].split("\s{0,},\s{0,}|\s+");
                    if (attributeParts[1].matches("-?\d+(\.\d+)?")) {   // validate median string numerical value (Double or Integer).
                        median = Double.valueOf(attributeParts[1]);        // convert to Double.
                        System.out.println("Median:     \t" + median);     // display median...
                    }
                    else {
                        System.err.println("Invalid Median value detected in file on line "
                                + lineCount + "! (" + lineMainParts[i] + ")");
                    }
                    if (attributeParts[2].matches("-?\d+|\+?\d+")) {   // validate scatter string numerical value (Integer).
                        scatter = Integer.valueOf(attributeParts[2]);     // convert to Integer
                        System.out.println("Scatter:    \t" + scatter);   // display scatter...
                    }
                    else {
                        System.err.println("Invalid Scatter value detected in file on line "
                                + lineCount + "! (" + lineMainParts[i] + ")");
                    }
                }
                else {
                    System.err.println("Unhandled Data Attribute detected in data file on line " + lineCount + "! ("
                            + lineMainParts[i] + ")");
                }
            }
        }
    }
    catch (FileNotFoundException ex) {
        System.err.println(ex);
    }
}

上面的代码中使用了几个Regular Expressions (RegEx)。以下是它们在代码中出现顺序的含义：

"\\s{0,};\\s{0,}"

与 String#split() 方法一起使用，用于解析分号 (;) 分隔的行。这个正则表达式几乎涵盖了何时需要拆分以分号分隔的字符串数据但分号在字符串中可能以多种不同方式间隔开的基础，例如：

"data;data ;data; data ; data;      data       ;data"

\s{0,} 0 个或多个空格在分号之前。
; 文字分号分隔符本身。
\s{0,} 0 个或多个空格在分号之后。

"\\s+"

与 String#split() 方法一起用于解析以空格 (" ") 分隔的行。这个正则表达式几乎涵盖了何时需要拆分以空格分隔的字符串数据的基础，但是可能有 1 到几个空格或制表符分隔字符串标记，例如：

"datadata"                      Split to: [datadata] (Need at least 1 space)
"data data"                     Split to: [data, data] 
"data   data"                   Split to: [data, data] 
"data        data       data"   Split to: [data, data, data]

"-?\\d+|\\+?\\d+"

与 String#matches() 方法一起用于字符串数字验证。此正则表达式用于查看测试的字符串是否确实是有符号或无符号整数数值（任意长度）的字符串表示形式。在将数字值转换为整数之前，在上面的代码中用于数字字符串验证。字符串表示可以是：

-1   1   324   +2   342345   -65379   74   etc.

-? 如果字符串可选地以或不以表示带符号值的连字符。
\d+ 字符串包含 1 个或多个 (+) 位数字 0 到 9.
| 逻辑或
\+? 如果字符串可选地以或不以加号.
\d+ 字符串包含 1 个或多个 (+) 位数字 0 到 9.

"\\s{0,},\\s{0,}|\\s+"（必须是这个顺序）

与 String#split() 方法一起使用，用于解析逗号 (,) 分隔行。这个正则表达式几乎涵盖了何时需要拆分逗号分隔的字符串数据的基础，但逗号可以在字符串中以几种不同的方式间隔，例如：

"my data,data"            Split to: [my, data, data] 
"my data ,data"           Split to: [my, data, data] 
"my data, data"           Split to: [my, data, data] 
"my data , data"          Split to: [my, data, data] 
"my   data,      data"    Split to: [my, data, data] 
"my    data      ,data"   Split to: [my, data, data]

\s{0,} 0 个或多个空格在逗号之前。
, 文字逗号分隔符本身。
\s{0,} 0 个或多个空格 在逗号 之后。
| 逻辑或拆分...
\s+ 只需一个或多个空格分隔符。

所以换句话说，拆分为：逗号 OR 逗号拆分和一个或多个空格 OR 一个或拆分更多的空格和逗号 OR 拆分为一个或多个空格和逗号以及一个或多个空格 OR 拆分为一个或多个空格

"-?\\d+(\\.\\d+)?"

与 String#matches() 方法一起用于字符串数字验证。此正则表达式用于查看测试的字符串是否确实是有符号或无符号整数或双精度类型数值（任意长度）的字符串表示形式。在将数字值转换为 Double 之前，在上面的代码中用于数字字符串验证。字符串表示可以是：

-1.34   1.34   324   2.54335   342345   -65379.7   74   etc.

-? 如果字符串可选地以或不以表示带符号值的连字符。
\d+ 字符串包含 1 个或多个 (+) 位数字 0 到 9。[到目前为止，该字符串将被视为整数。]
( 组开始。
\. 如果字符串包含文字句点 (.) 在第一组数字之后。
\d+ 字符串包含从 0 到 9 的 1 个或更多 (+) 个数字在句点之后。
) 组结束。
? 组表达式中表达的数据可能或 可能不存在 使组成为 选项组.

希望以上内容能够帮助您入门。

从可能包含或不包含符号的扫描仪令牌中获取整数

Taking an integer from a scanner token that may or may not include symbols

java

java.util.scanner