Mongodb 错误 $substrBytes：无效范围，结束索引位于 UTF-8 字符的中间

Question

我正在查询以从我的书 collection 中提取属性 "titlu" 的不同首字母，以便根据书的第一个字母对这些书进行分组"titlu" 属性。我有一些以 UTF-8 字符开头的标题，例如 Ç、Ş、Ş 等，但出现此错误：

显而易见的问题是：如何消除该错误？有两个可接受的选项：

理想情况下，我应该能够显示 Ç、pos、pos 等
如果不行，也可以改为显示 I、S、T，并将以 Ç 开头的标题归为 I 组，以 ş 开头的标题归为 S 组等。

但是，转换必须在 mongo 查询中完成，因为我还需要标题的计数。（对于解决方案 #2，我们需要字母 I，例如对以 I 开头的标题的出现次数 + 以 È 开头的标题的出现次数求和）。

Answer 1

您应该使用 $substrCP 而不是 $substr。 $substrCP 是在 mongodb 3.4 中引入的，用于解决此类问题，因为 $substr 仅适用于ASCII 字符

来自 mongodb 文档：

$substrCP

Returns the substring of a string. The substring starts with the character at the specified UTF-8 code point (CP) index (zero-based) in the string for the number of code points specified.

所以您的查询将是：

db.carte.aggregate([
  {$project: {
      preview: {$substrCP: ["$titlu", 0, 1]}
    }
  }
])

你可以在线试一下：mongoplayground.net/p/X6Mo1yEhJoI

Mongodb 错误 $substrBytes：无效范围，结束索引位于 UTF-8 字符的中间

Mongodb error $substrBytes: Invalid range, ending index is in the middle of a UTF-8 character

utf-8

mongodb

aggregation-framework

spring-data-mongodb