如何忽略 corona sdk 文本换行中的 tashkeel 字符计数?

how to ignore tashkeel character counting in text wrapping in corona sdk?

我在 corona

中进行文本换行(将长字符串分成多行)
function wrap(str, limit, indent, indent1)
  indent = indent or ""
  indent1 = indent1 or indent
  limit = limit or 72
  local here = 1-#indent1

  str = replacePartOfString(str,"*","\n")

  return indent1..str:gsub("(%s+)()(%S+)()",
                          function(sp, st, word, fi)
                            if fi-here > limit then
                              here = st - #indent
                              return "\n"..indent..word
                            end
                          end)
end

local someString = " This is intended for strings without newlines in them (i.e. after reflowing the text and breaking it into paragraphs.)  This is intended for strings without newlines in them (i.e. after reflowing the text and breaking it into paragraphs.)  This is intended for strings without newlines in them (i.e. after reflowing the text and breaking it into paragraphs.)  This is intended for strings without newlines in them (i.e. after reflowing the text and breaking it into paragraphs.) "

print_r(string.split(wrap(someString,70,"",""),"\n"))

它适用于英语和阿拉伯语,但唯一的问题是它将阿拉伯语中的 tashkeel 计为字母,忽略这些字符而不计算它们的最佳方法是什么?我想保留它们,但不计入文本换行。

string.gsub() 作用于字符串的 字节 ,而不作用于它的 个字符 。当字符串包含 Unicode 文本时会有所不同。 使用 utf8 库获取字符。

阿拉伯语中 tashkeel 的 unicode id 是:

[\x{064B}-\x{0650}],[\x{0618}-\x{061A}],[\x{0652}-\x{0653}],[\x{0652}-\x{0653}]

您可以使用任何代码删除它。

关注