摘要-文本排序算法

Summarization-Text rank algorithm

使用text rank算法进行摘要相对于BERT摘要有什么优势？即使两者都可以用作提取摘要方法，但是文本排名有什么特别的优势吗？

TextRank 实现往往是轻量级的，即使在内存资源有限的情况下也可以运行快速，而 BERT tend to be rather large and require lots of memory. While the TinyML 社区等转换器模型在制作 DL 模型的技术方面有出色的工作运行在有限的资源范围内，某些用例可能会有资源优势。

一些 TextRank 实现可以通过添加语义关系来“定向”，可以将其视为先验结构以丰富所使用的图形——或者在某些情况下结合 human-in 的手段-the-loop 方法。这些可以提供优于纯粹基于数据训练的监督学习模型的优势。即便如此，总的来说，DL 也有类似的努力（例如，迁移学习 主题的变体），transformer 可能会从中受益。

另一个潜在的好处是 TextRank 方法往往更透明，而转换器模型在 可解释性 方面可能具有挑战性。有些工具可以提供很大帮助，但在 模型偏差和公平性 、 数据伦理 、 监管的背景下，这种担忧变得很重要合规性，等等。

根据个人经验，虽然我是其中一个流行的 TextRank open source implementations, I only use its extractive summarization features for use cases where a "cheap and fast" solution is needed. Otherwise I'd recommend considering more sophisticated approaches to summarization. For example, I recommend keeping watch on the ongoing research by the author of TextRank, Rada Mihalcea 和她在密歇根大学的研究生的主要提交者。

在比较 “哪种文本摘要方法效果更好？” 方面，我会指出 抽象摘要 方面的工作，尤其是最近John Bohannon, et al., at Primer. For excellent examples, check the "Daily Briefings" of CV19 research which their team generates using natural language understanding, knowledge graph, abstractive summarization, etc. Amy Heineike discusses their approach in "Machines for unlocking the deluge of COVID-19 papers, articles, and conversations".

工作

摘要-文本排序算法

Summarization-Text rank algorithm

python

nlp

machine-learning

bert-language-model