Tkinter 和 32 位 Unicode 复制——任何修复?

Tkinter and 32-bit Unicode duplicating – any fix?

我只想展示 Chip,但我得到了 Chip AND Dale。 我输入哪个 32 位字符似乎并不重要,tkinter 似乎复制了它们 - 它不仅仅是花栗鼠

我在想我可能必须将它们渲染为 png,然后将它们作为图像放置,但这似乎有点……笨手笨脚。

还有其他解决方案吗? tkinter 是否计划解决这个问题?

import tkinter as tk

# Python 3.8.3
class Application(tk.Frame):
    def __init__(self, master=None):
        self.canvas = None
        self.quit_button = None
        tk.Frame.__init__(self, master)
        self.grid()
        self.create_widgets()

    def create_widgets(self):
        self.canvas = tk.Canvas(self, width=500, height=420, bg='yellow')
        self.canvas.create_text(250, 200, font="* 180", text='\U0001F43F')
        self.canvas.grid()

        self.quit_button = tk.Button(self, text='Quit', command=self.quit)
        self.quit_button.grid()

app = Application()
app.master.title('Emoji')
app.mainloop()

其中一位 python 贡献者认为 TCL/Tk can-not/will-not 支持可变宽度编码(它总是在内部转换固定宽度编码),这表明 Tcl/Tk 是不适合一般的UTF-8开发。

根本问题是 Tcl 和 Tk 对非 BMP(Unicode 基本多语言平面)字符不是很满意。在 8.6.10 之前,会发生什么是任何人的猜测;实施简单地假设这些字符不存在,并且当它们实际出现时 已知是错误的 (关于这个的各个方面有几张票)。 8.7 将有更强大的修复(详情请参阅 TIP #389)——基本目标是如果你输入非 BMP 字符,它们可以从另一端取出,这样它们就可以写入 UTF -8 文件或由 Tk 显示,如果字体引擎设计支持它们——但一些操作仍然是错误的,因为字符串实现仍将使用代理项。 9.0 将正确修复问题(通过将基本字符存储单元更改为足够大以容纳任何 Unicode 代码点)但这是一个破坏性的更改。

对于已发布的版本,如果您可以让代理从 Python 翻墙到 Tcl,它们可能最终会出现在 可能 做的 GUI 引擎中正确的事情。在某些情况下(不包括我目前拥有的任何构建,FWIW,但我有奇怪的构建所以不要读太多)。对于 8.7,通过 UTF-8 发送将能够工作;这是将得到保证的功能配置文件的一部分。 (编码函数存在于旧版本中,但在 8.6 版本中,它们会对非 BMP UTF-8 做错误的事情,并且会奇怪地破坏比那个更旧的版本。)

问题

可能发生了几件事:
  • 这就是表情符号。没有办法解决,只能换 源表情符号。
  • Tk and/or Tcl 与表情符号混淆。这意味着它不是 确定要放什么表情符号,所以它放了 2 只花栗鼠。当我尝试那个表情符号时 我的 Linux 电脑出现错误。

解决方案

唯一的解决办法可能是将表情符号保存为文件,然后创建图像。但可能还有其他稍微复杂的方法。例如,您可以在第二只花栗鼠上方创建一个 Frame 矩形来隐藏它。

正如您所指出的,您的代码在 Windows 上正常工作(在 Windows 10 上测试),但是对于 macOS,以下解决方法应该有效:

  1. 将表情符号的编码从 UTF-32 转换为 UTF-16(自 UTF-16 是一种可变长度编码,因此任何可以用 UTF-32 表示的代码点都可以转换为 UTF-16 仅在涉及现代表情符号的情况下,UTF-16 编码值将使用 32 位 ,与 [=58] 相同=]UTF-32,意味着它应该支持Unicode v11字符表示)。
  2. 将生成的 字符串 传递给嵌入式 Tcl/Tk 解释器。

UTF-16 Programming with Unicode

In UTF-16, characters in ranges U+0000—U+D7FF and U+E000—U+FFFD are stored as a single 16 >bits unit. Non-BMP characters (range U+10000—U+10FFFF) are stored as “surrogate pairs”, >two 16 bits units: an high surrogate (in range U+D800—U+DBFF) followed by a low surrogate (in range U+DC00—U+DFFF).

对于 Tcl 执行 unicode 转义字符串(及其 character/emoji 表示)的替换,字符串本身必须是 "\uXXXX""\uXXXX\uXXXX".

chipmunk Emoji 的编码必须转换为UTF-16 => "\ud83d\udc3f"


    # The tcl/tk code
    set chipmunk "\ud83d\udc3f"
    
    pack [set c [canvas .c -highlightcolor blue -highlightbackground black -background yellow]] -padx 4cm -pady 4cm -expand 1 -fill both
    
    set text_id [$c create text 0 0 -text $chipmunk -font [list * 180]]
    
    $c moveto $text_id 0 0

python中的等效代码,将在某些时候绕过tkinter并直接发布tcl 命令 embedded/linked 解释器

import tkinter as tk

# the top-level window
top = tk.Tk()

# the canvas
c = tk.Canvas(top, highlightcolor = 'blue', highlightbackground = 'black', background = 'yellow')

# create the text item, with placeholder text
text_id = c.create_text(0,0, font = '* 180', text = 'to be replaced')

# pack it
c.pack(side = 'top', fill = 'both' , expand = 1, padx = '4c' , pady = '4c')

# The 'Bypassing' aka issuing tcl/tk calls directly
# For Tk calls use => c.tk.cal(...), we will not use this.
# For bare Tcl => c.tk.eval(...)

# chipmunk in UTF-16 (in this instance it is using 32-bits to represent the codepoint)
# as a raw string

chipmunk = r"\ud83d\udc3f"

# create another variable in tcl/tk
c.tk.eval('set the_tcl_chipmunk {}'.format(chipmunk))

# set the text_id item's -text property/option as the value of variable the_tcl_chipmunk, gotten by calling the tcl's set command

c.tk.eval( '{} itemconfig {} -text [set the_tcl_chipmunk]'.format( str(c), text_id ) )

# Apparently a hack to get the chipmunk in position
c.tk.eval( '{} moveto {} 0 0'.format( str(c), text_id ) )

# the main gui event loop
top.mainloop()

获取 chipmunk

UTF-16

您可以采用两种途径:

  1. 从网站获取,我一直使用 fileformat.info chipmunk on fileformat.info 并复制 [=58= 显示的值]C/C++/Java源代码

  2. [=119中从UTF-32转换为UTF-16 =]


# A UTF-32 string, since it's of the form "\UXXXX_XXXX" ( _ is not part of the syntax, a mere visual aide fo illustrative purposes)
chipmunk_utf_32 = '\U0001F43F'

# convert/encode it to UTF-16 (big endiann), to get a bytes object

chipmunk_utf_16 = chipmunk_utf_32.encode('utf-16-be')

# obtain the hex representation
chipmunk_utf_16 = chipmunk_utf_16.hex()

#format it to be an escaped UTF-16 tcl string
chipmunk = '\u{}\u{}'.format(chipmunk_utf_16[0:4], chipmunk_utf_16[4:8])

编辑:整个脚本

import tkinter as tk

# A UTF-32 string, since it's of the form "\UXXXX_XXXX" ( _ is not part of the syntax, a mere visual aide fo illustrative purposes)
chipmunk_utf_32 = '\U0001F43F'

# convert/encode it to UTF-16 (big endiann), to get a bytes object

chipmunk_utf_16 = chipmunk_utf_32.encode('utf-16-be')

# obtain the hex representation
chipmunk_utf_16 = chipmunk_utf_16.hex()

#format it to be an escaped UTF-16 tcl string
chipmunk = '\u{}\u{}'.format(chipmunk_utf_16[0:4], chipmunk_utf_16[4:8])

# the top-level window
top = tk.Tk()

# the canvas
c = tk.Canvas(top, highlightcolor = 'blue', highlightbackground = 'black', background = 'yellow')

# create the text item, with placeholder text
text_id = c.create_text(0,0, font = '* 180', text = 'to be replaced')

# pack it
c.pack(side = 'top', fill = 'both' , expand = 1, padx = '4c' , pady = '4c')

# The 'Bypassing' aka issuing tcl/tk calls directly
# For Tk calls use => c.tk.cal(...), we will not use this.
# For bare Tcl => c.tk.eval(...)

# chipmunk in UTF-16 (in this instance it is using 32-bits to represent the codepoint)
# as a raw string

#print(chipmunk)
#chipmunk = r"\ud83d\udc3f"

# create another variable in tcl/tk
c.tk.eval('set the_tcl_chipmunk {}'.format(chipmunk))

# set the text_id item's -text property/option as the value of variable the_tcl_chipmunk, gotten by calling the tcl's set command

c.tk.eval( '{} itemconfig {} -text [set the_tcl_chipmunk]'.format( str(c), text_id ) )

# Apparently a hack to get the chipmunk in position
c.tk.eval( '{} moveto {} 0 0'.format( str(c), text_id ) )

top.mainloop()