如何将文件输入分割成 Java 中的部分

How to segment file input into portions in Java

我需要在下面的文件中分隔每条规则。 我怎样才能在 Java 中做到这一点?

这是文件内容

rule apt_regin_2011_32bit_stage1 {
meta:
copyright = "Kaspersky Lab"
 description = "Rule to detect Regin 32 bit stage 1 loaders"
 version = "1.0"
 last_modified = "2014-11-18"
strings:
$key1={331015EA261D38A7}
$key2={9145A98BA37617DE}
$key3={EF745F23AA67243D}
$mz="MZ"
condition:
($mz at 0) and any of ($key*) and filesize < 300000
}


rule apt_regin_rc5key {
meta:
copyright = "Kaspersky Lab"
 description = "Rule to detect Regin RC5 decryption keys"
 version = "1.0"
 last_modified = "2014-11-18"
strings:
$key1={73 23 1F 43 93 E1 9F 2F 99 0C 17 81 5C FF B4 01}
$key2={10 19 53 2A 11 ED A3 74 3F C3 72 3F 9D 94 3D 78}
condition:
any of ($key*)
}



rule apt_regin_vfs {
meta:
copyright = "Kaspersky Lab"
 description = "Rule to detect Regin VFSes"
 version = "1.0"
 last_modified = "2014-11-18"
strings:
$a1={00 02 00 08 00 08 03 F6 D7 F3 52}
$a2={00 10 F0 FF F0 FF 11 C7 7F E8 52}
$a3={00 04 00 10 00 10 03 C2 D3 1C 93}
$a4={00 04 00 10 C8 00 04 C8 93 06 D8}
condition:
($a1 at 0) or ($a2 at 0) or ($a3 at 0) or ($a4 at 0)
}


rule apt_regin_dispatcher_disp_dll {
meta:
copyright = "Kaspersky Lab"
 description = "Rule to detect Regin disp.dll dispatcher"
 version = "1.0"
 last_modified = "2014-11-18"
strings:
$mz="MZ"
 $string1="shit"
 $string2="disp.dll"
 $string3="255.255.255.255"
 $string4="StackWalk64"
 $string5="imagehlp.dll"
condition:
($mz at 0) and (all of ($string*))
}

如文件中所示,我需要将文件输入中找到的 4 条规则中的每一条分开,知道我该怎么做吗? 请耐心等待我。我是新手 提前赞赏!

将所有 4 个规则分开后,我需要将每个规则放入一个数组列表中。

例如: 数组列表[0]

rule apt_regin_2011_32bit_stage1 {
meta:
copyright = "Kaspersky Lab"
 description = "Rule to detect Regin 32 bit stage 1 loaders"
 version = "1.0"
 last_modified = "2014-11-18"
strings:
$key1={331015EA261D38A7}
$key2={9145A98BA37617DE}
$key3={EF745F23AA67243D}
$mz="MZ"
condition:
($mz at 0) and any of ($key*) and filesize < 300000
}

数组列表[1]

rule apt_regin_rc5key {
meta:
copyright = "Kaspersky Lab"
 description = "Rule to detect Regin RC5 decryption keys"
 version = "1.0"
 last_modified = "2014-11-18"
strings:
$key1={73 23 1F 43 93 E1 9F 2F 99 0C 17 81 5C FF B4 01}
$key2={10 19 53 2A 11 ED A3 74 3F C3 72 3F 9D 94 3D 78}
condition:
any of ($key*)
}

数组列表[2]

rule apt_regin_vfs {
meta:
copyright = "Kaspersky Lab"
 description = "Rule to detect Regin VFSes"
 version = "1.0"
 last_modified = "2014-11-18"
strings:
$a1={00 02 00 08 00 08 03 F6 D7 F3 52}
$a2={00 10 F0 FF F0 FF 11 C7 7F E8 52}
$a3={00 04 00 10 00 10 03 C2 D3 1C 93}
$a4={00 04 00 10 C8 00 04 C8 93 06 D8}
condition:
($a1 at 0) or ($a2 at 0) or ($a3 at 0) or ($a4 at 0)
}

等等。

我该怎么做?

仅作记录:如果您的问题是到"segment"您输入的"rules",那么只需执行:

List<List<String>> sections = new ArrayList<>();
List<String> currentSection = null;

try (BufferedReader br = new BufferedReader(new FileReader(file))) {
  String line;
  while ((line = br.readLine()) != null) {
    if(line.startsWith("rule ")) {
      if (currentSection != null) {
        // we are finished with the previous section!
        sections.add(currentSection);
      }
      currentSection = new ArrayList<>();
      currentSection.add(line);
    } else {
      if(! line.trim().isEmpty()) {
        // any non-empty line goes into the current section
        currentSection.add(line);          
      }
    }
 }
} // end of try/while ... I am too lazy to count my braces ;-)
if (currentSelection != null) {
  // make sure to add the final section, too!
  sections.add(currentSelection); 
}

但是:你对你真正的要求不是很准确。我很确定您真正的问题不在于 "segmenting" 该输入文件。

很可能,您的实际任务是读取该文件,并且对于该文件中的每个部分,您需要获取其内容的 some/all 以进行进一步处理。

换句话说:你实际上是在问 "how do I parse/process" 这个输入。我们无法回答这个问题;因为你没有告诉我们你到底想用这些数据做什么。

本质上,这是您的选择space:

  1. 如果真的有这么一个固定的布局,那么"parsing"归结起来理解"first comes rule, then comes meta, which looks like ..."。意思是:你"hard-code"把你的数据结构放到你的代码中。示例:您恰好 "know" 第三行包含 copyright = "some value"。然后你开始使用正则表达式(或简单的字符串方法,如 indexOf()、substring())来提取你感兴趣的信息。
  2. 如果文件格式实际上是某种 "standard"(例如 XMl、JSON、YAML 等),那么您可以简单地选择一些第 3 方库解析此类文件。举个例子……我不能说;这绝对不是我熟悉的格式。
  3. 最坏的情况,您需要编写自己的解析器。编写解析器是一个复杂但 "well researched" 的主题,例如参见 [​​=12=]。