如何使用 javascript 标记句子

How to to tokenize sentence using javascript

我正在尝试使用 JavaScript 拆分函数来标记以下句子。

  CHRIS NISWANDEE,
   (SMALLSYS INC,
   795 E DRAGRAM),
   TUCSON AZ 85705,
   USA

我的预期结果是,

 "chris","niswnadee",",","(","smallsys","inc","785","e","dgram","("...
etc

我可以使用以下代码在单词边界处拆分,

"CHRIS NISWANDEE, (SMALLSYS INC, 795 E DRAGRAM), TUCSON AZ 85705, USA".split(/\b\s+/)

有什么办法可以在我的结果中得到那些逗号和括号吗?

看来你想在 /\s+|\b/ 分手。

表示:"any sequence of whitespace (\s+) or (|) any word boundary (\b)"

"CHRIS NISWANDEE, (SMALLSYS INC, 795 E DRAGRAM), TUCSON AZ 85705, USA".split(/\s|\b/)

输出

["CHRIS", "NISWANDEE", ",", "(", "SMALLSYS", "INC", ",", "795", "E", "DRAGRAM", "),", "TUCSON", "AZ", "85705", ",", "USA"]