如何使用 javascript 标记句子
How to to tokenize sentence using javascript
我正在尝试使用 JavaScript 拆分函数来标记以下句子。
CHRIS NISWANDEE,
(SMALLSYS INC,
795 E DRAGRAM),
TUCSON AZ 85705,
USA
我的预期结果是,
"chris","niswnadee",",","(","smallsys","inc","785","e","dgram","("...
etc
我可以使用以下代码在单词边界处拆分,
"CHRIS NISWANDEE, (SMALLSYS INC, 795 E DRAGRAM), TUCSON AZ 85705, USA".split(/\b\s+/)
有什么办法可以在我的结果中得到那些逗号和括号吗?
看来你想在 /\s+|\b/
分手。
表示:"any sequence of whitespace (\s+
) or (|
) any word boundary (\b
)"
"CHRIS NISWANDEE, (SMALLSYS INC, 795 E DRAGRAM), TUCSON AZ 85705, USA".split(/\s|\b/)
输出
["CHRIS", "NISWANDEE", ",", "(", "SMALLSYS", "INC", ",", "795", "E", "DRAGRAM", "),", "TUCSON", "AZ", "85705", ",", "USA"]
我正在尝试使用 JavaScript 拆分函数来标记以下句子。
CHRIS NISWANDEE,
(SMALLSYS INC,
795 E DRAGRAM),
TUCSON AZ 85705,
USA
我的预期结果是,
"chris","niswnadee",",","(","smallsys","inc","785","e","dgram","("...
etc
我可以使用以下代码在单词边界处拆分,
"CHRIS NISWANDEE, (SMALLSYS INC, 795 E DRAGRAM), TUCSON AZ 85705, USA".split(/\b\s+/)
有什么办法可以在我的结果中得到那些逗号和括号吗?
看来你想在 /\s+|\b/
分手。
表示:"any sequence of whitespace (\s+
) or (|
) any word boundary (\b
)"
"CHRIS NISWANDEE, (SMALLSYS INC, 795 E DRAGRAM), TUCSON AZ 85705, USA".split(/\s|\b/)
输出
["CHRIS", "NISWANDEE", ",", "(", "SMALLSYS", "INC", ",", "795", "E", "DRAGRAM", "),", "TUCSON", "AZ", "85705", ",", "USA"]