OCaml 解析大文本

Question

OCaml，如何使用页面模块分解大型多行文本数据。忽略新行开头的符号。

let get_info content =
  let re = Str.regexp "\(.+?\)" in
  match Str.string_match re content 0 with
    | true -> print_endline("-->"^(Str.matched_group 1 content)^"<--")
    | false -> print_endline("not found");;

这个例子returns只有第一行，但需要多行一些文字。

Answer 1

根据http://pleac.sourceforge.net/pleac_ocaml/patternmatching.html：

Str's regexps lack a whitespace-matching pattern.

因此，这是该页面上建议的解决方法：

#load "str.cma";;
...
let whitespace_chars =
  String.concat ""
    (List.map (String.make 1)
       [
         Char.chr 9;  (* HT *)
         Char.chr 10; (* LF *)
         Char.chr 11; (* VT *)
         Char.chr 12; (* FF *)
         Char.chr 13; (* CR *)
         Char.chr 32; (* space *)
       ])

然后

let re = Str.regexp "\((?:[^" ^ whitespace_chars ^ "]|" ^ whitespace_chars ^ ")+?\)" in

OCaml 解析大文本

OCaml parse large text

regex

ocaml