调整 NLTK 句子分词器保留括号中的句子

Tweak NLTK sentence tokenizer reserve sentence in bracket

我有一句话不想在括号内拆分,知道吗?

示例:

Today is Monday. [Money can buy this and this. But it can't buy love.]

当前输出:

Today is Monday.

[Money can buy this and this.

But it can't buy love.]

预期输出:

Today is Monday.

[Money can by this and this. But it can't buy love.]

您需要对输入数据进行一些预处理。使用 split() 函数在左括号和右括号处拆分。这样就可以对元素进行索引,被"normal"句和括号括起来的句子交替出现。那么你可以决定哪些应该拆分,哪些不应该拆分。然后重新加入元素并在需要时恢复括号。