调整 NLTK 句子分词器保留括号中的句子
Tweak NLTK sentence tokenizer reserve sentence in bracket
我有一句话不想在括号内拆分,知道吗?
示例:
Today is Monday. [Money can buy this and this. But it can't buy love.]
当前输出:
Today is Monday.
[Money can buy this and this.
But it can't buy love.]
预期输出:
Today is Monday.
[Money can by this and this. But it can't buy love.]
您需要对输入数据进行一些预处理。使用 split() 函数在左括号和右括号处拆分。这样就可以对元素进行索引,被"normal"句和括号括起来的句子交替出现。那么你可以决定哪些应该拆分,哪些不应该拆分。然后重新加入元素并在需要时恢复括号。
我有一句话不想在括号内拆分,知道吗?
示例:
Today is Monday. [Money can buy this and this. But it can't buy love.]
当前输出:
Today is Monday.
[Money can buy this and this.
But it can't buy love.]
预期输出:
Today is Monday.
[Money can by this and this. But it can't buy love.]
您需要对输入数据进行一些预处理。使用 split() 函数在左括号和右括号处拆分。这样就可以对元素进行索引,被"normal"句和括号括起来的句子交替出现。那么你可以决定哪些应该拆分,哪些不应该拆分。然后重新加入元素并在需要时恢复括号。