用于翻译文档索引的 apache solr

apache solr for translated documents indexing

Apache solr 是否允许这样做：

除了翻译成法语的文档、原文以及原文中的使用上下文之外，return是否有可能向用户发送信息？

要索引的文档是pdf文件。

ُ编辑：添加示例

我有原文doc_eng.pdf和译文doc_fr.pdf

当查询响应中的 doc_fr.pdf 为 return 时，如果可能的话，我希望能够通过上下文（突出显示）获得 doc_eng.pdf

我的建议

1- 将 doc_fr.pdf 和 doc_eng.pdf 映射到相同的 ID（如果可以的话）并添加一个布尔字段 isOriginal =true|false .

2- 使用嵌套文档（但我不知道这将如何处理 pdf 文件）

是的，solr 可以做到这一点。我建议你使用 apache tika mechanism

Solr can identify languages and map text to language-specific fields during indexing using the langid UpdateRequestProcessor.

Solr 支持此功能的两种实现方式：

Tika’s language detection feature

[LangDetect language detection](https://github.com/shuyo/language-detection https://lucene.apache.org/solr/guide/7_2/language-analysis.html)