分块后删除词性标签
Remove Part of Speech Tags after chunking
如何从分块结果中去除词性标签?
我正在使用 NLTK 来执行此操作。目前我只能使用以下代码迭代到块:
for i in sent_list:
tagged = nltk.pos_tag(i)
ChunkGram = r"""Chunk: {<VB.?>+<JJ.?>*<NN.?>}"""
ChunkParser = nltk.RegexpParser(ChunkGram)
chunked = ChunkParser.parse(tagged)
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Chunk'):
print(subtree)
假设我的结果是这样的:
(Chunk routing/VBG rework/NN build/NN)
(Chunk build/VBP instruction/NN schedule/NN lot/NN)
(Chunk based/VBN firm/NN plan/NN)
预期结果:
'routing','rework','build'
或
'routing rework build'
可以吗?或者请告诉我如何提取这些短语。
我找到了这段代码,它帮助我实现了我想要的结果。
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Verb'):
verblist.append(" ".join([a for (a,b) in subtree.leaves()]))
如何从分块结果中去除词性标签? 我正在使用 NLTK 来执行此操作。目前我只能使用以下代码迭代到块:
for i in sent_list:
tagged = nltk.pos_tag(i)
ChunkGram = r"""Chunk: {<VB.?>+<JJ.?>*<NN.?>}"""
ChunkParser = nltk.RegexpParser(ChunkGram)
chunked = ChunkParser.parse(tagged)
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Chunk'):
print(subtree)
假设我的结果是这样的:
(Chunk routing/VBG rework/NN build/NN)
(Chunk build/VBP instruction/NN schedule/NN lot/NN)
(Chunk based/VBN firm/NN plan/NN)
预期结果:
'routing','rework','build'
或
'routing rework build'
可以吗?或者请告诉我如何提取这些短语。
我找到了这段代码,它帮助我实现了我想要的结果。
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Verb'):
verblist.append(" ".join([a for (a,b) in subtree.leaves()]))