如何在 GO 中操作字符串来反转它们?

How to manipulate strings in GO to reverse them?

我正在尝试在 go 中反转字符串,但在处理字符时遇到问题。与 C 不同,GO 将字符串视为字节向量,而不是字符,这里称为符文。我尝试进行一些类型转换来完成作业,但到目前为止我做不到。

这里的想法是用大小为 100、200、300、400 和 500 的随机字符生成 5 个字符串,然后反转它们的字符。我能够轻松地使 C 工作,但在 GO 中,语言 returns 出现错误,指出无法执行分配。

 func inverte() {
    var c = "A"
    var strs, aux string

    rand.Seed(time.Now().UnixNano())
    // Gera 5 vetores de 100, 200, 300, 400, e 500 caracteres
    for i := 1; i < 6; i++ {
        strs = randomString(i * 100)
        fmt.Print(strs)

        for i2, j := 0, len(strs); i2 < j; i2, j = i+1, j-1 {
           aux = strs[i2]
           strs[i2] = strs[j]
           strs[j] = aux
       }
   }
}

正如您正确识别的那样,go strings are immutable,因此您不能在给定索引处分配给 rune/character 值。

不是就地反转字符串,而是必须在字符串中创建符文的副本并将其反转,然后 return 生成的字符串。

例如(Go Playground):

func reverse(s string) string {
  rs := []rune(s)
  for i, j := 0, len(rs)-1; i < j; i, j = i+1, j-1 {
    rs[i], rs[j] = rs[j], rs[i]
  }
  return string(rs)
}

func main() {
  fmt.Println(reverse("Hello, World!"))
  // !dlroW ,olleH
  fmt.Println(reverse("Hello, 世界!"))
  // !界世 ,olleH
}

由于 Unicode 的复杂性(例如 combining diacritical marks),此方法存在一些问题,但这将帮助您入门。

如果要考虑unicode combining characters (characters that are intended to modify other characters, like an acute accent ´ + e = é), Andrew Sellers has an interesting take in this gist.

首先列出 Unicode block range for all combining diacritical marks (CDM) (the Unicode block containing the most common combining characters)

var combining = &unicode.RangeTable{
    R16: []unicode.Range16{
        {0x0300, 0x036f, 1}, // combining diacritical marks
        {0x1ab0, 0x1aff, 1}, // combining diacritical marks extended
        {0x1dc0, 0x1dff, 1}, // combining diacritical marks supplement
        {0x20d0, 0x20ff, 1}, // combining diacritical marks for symbols
        {0xfe20, 0xfe2f, 1}, // combining half marks
    },
}

然后您可以一个接一个地阅读您的初始字符串:

sv := []rune(s)

但是如果你按相反的顺序这样做,你会遇到组合变音标记(CDM) 首先以及那些需要保持顺序的,为了被颠倒

for ix := len(sv) - 1; ix >= 0; ix-- {
        r := sv[ix]
        if unicode.In(r, combining) {
            cv = append(cv, r)
            fmt.Printf("Detect combining diacritical mark ' %c'\n", r)
        }

(注意 %c 周围的 space 组合符文: '%c' 没有 space 意味着将标记与第一个 'ͤ' 组合:而不是' ͤ '。我尝试使用 CGJ Combining Grapheme Joiner \u034F,但这不起作用)

如果你最终遇到一个普通的符文,你需要在将它添加到你的反向最终符文阵列之前与那些CDM结合。

        } else {
            rrv := make([]rune, 0, len(cv)+1)
            rrv = append(rrv, r)
            rrv = append(rrv, cv...)
            fmt.Printf("regular mark '%c' (with '%d' combining diacritical marks '%s') => '%s'\n", r, len(cv), string(cv), string(rrv))
            rv = append(rv, rrv...)
            cv = make([]rune, 0)
        }

它变得更加复杂的地方是表情符号,例如最近,肤色 Medium-Dark Skin Tone, the type 5 on the Fitzpatrick Scale 等修饰符。
如果忽略,反向'‍‍⚖️'会得到'️⚖‍‍',失去最后两个表情符号的肤色。

不要让我开始 ZERO WIDTH JOINER (200D), which, from Wisdom/Awesome-Unicode,强制相邻字符连接在一起(例如,阿拉伯字符或支持的表情符号)。可以用它来组合顺序组合的表情符号。

这里有两个组合表情的例子,当“反转”时,其内部元素顺序应保持相同顺序:

‍单独是(来自Unicode to code points converter):

那些应该保持完全相同的顺序。

character”“judge”(意思是“judge”的语义值的抽象概念)可以用多个字形或一个字形来表示。

‍⚖️实际上是一个组合字形(这里由两个表情组合而成),代表法官。该序列应该 而不是 倒置。
下面的程序正确地检测到“零宽度连接器”并反转它组合的表情符号。
inspect that emoji,你会发现它由以下组成:

同样,需要保留该顺序。

注:actual judge emoji ‍⚖️ uses a MAN (1F468), instead of an Adult (1F9D1) (plus the other characters listed above: dark skin, ZWJ, scale), and is therefore represented as one glyph, instead of a cluster of graphemes.

含义:“法官”的官方表情符号单一字形,需要将“man”和“scale”组合(产生一个字形‍⚖️)而不是“adult”+“scale”。
后者,“成人”+“比例”,仍然被视为“一个字符”:你不能 select 只有比例,因为 ZWJ(零宽度连接器)。
但是那个“字符”被表示为一个组合字形‍⚖️,两个字形,每个字形都是具体的书面表示,通过代码点+字体)grapheme

显然,使用第一个组合(“man”+“scale”)会产生更具表现力的字符‍⚖️。

参见“The relationship between graphemes and abstract characters for textual representation

Graphemes and orthographic characters are fairly concrete objects, in the sense that they are familiar to common users—non-experts, who are typically taught to work in terms of them from the time they first learn their “ABCs” (or equivalent from their writing system, of course).

In the domain of information systems, however, we have a different sense of character: abstract characters which are minimal units of textual representation within a given system.
These are, indeed, abstract in two important senses:

  • first, some of these abstract characters may not correspond to anything concrete in an orthography, as we saw above in the case of HORIZONTAL TAB.
  • Secondly, the concrete objects of writing (graphemes and orthographic characters) can be represented by abstract characters in more than one way, and not necessarily in a one-to-one manner, as we saw above in the case of “ô” being represented by a sequence <O, CIRCUMFLEX>.

然后:“From grapheme to codepoint to glyph”:

  • Graphemes are the units in terms of which users are usually accustomed to thinking.
  • Within the computer, however, processes are done in terms of characters.

We don’t make any direct connection between graphemes and glyphs.
As we have defined these two notions here, there is no direct connection between them. They can only be related indirectly through the abstract characters.
This is a key point to grasp: the abstract characters are the element in common through which the others relate.


Go playground 中的完整示例。

Reverse 'Hello, World' => 'dlroW ,olleH'
Reverse '⃠' => '⃠'
Reverse '‍‍⚖️' => '‍⚖️‍'
Reverse 'aͤoͧiͤ  š́ž́ʟ́' => 'ʟ́ž́š́  iͤoͧaͤ'
Reverse 'H̙̖ell͔o̙̟͚͎̗̹̬ ̯W̖͝ǫ̬̞̜rḷ̦̣̪d̰̲̗͈' => 'd̰̲̗͈ḷ̦̣̪rǫ̬̞̜W̖͝ ̯o̙̟͚͎̗̹̬l͔leH̙̖'