如何识别文本中的提及?

How to Identify mentions in a text?

我正在寻找基于规则的方法或任何其他方法来识别文本中的所有提及。我发现了几个提供共同引用但没有仅提及的确切选项的库。我想要的是如下所示:

输入文字:

[This painter]'s indulgence of visual fantasy, and appreciation of different historic architectural styles can be seen in his 1840 Architect's Dream. After a series of paintings on The Last of the Mohicans, [he] made a three year trip to Europe in 1829, but [he] is better known for a trip four years earlier in which [he] journeyed up the Hudson River to the Catskill Mountains. FTP, name [this painter of The Oxbow] and The Voyage of Life series.

*方括号突出提及。

如何查找提及?另外,它与共指有何不同?如果有人可以 post 链接到相关论文,那将非常有帮助。

我认为您可以从标准 dcoref 注释器中得到您想要的。看这个注解者设置的注解,CorefChainAnnotation。这是从文档实体到 "coref chains."

的映射

每个 CorefChain 都可以按文本顺序为您提供相关实体的提及列表。