Unicode字符使用C#从字符串中替换
Unicode characters replace from string using C#
string str = "our guests will experience \u001favor in an area";
bool exists = str.IndexOf("\u001", StringComparison.CurrentCultureIgnoreCase) > -1;
想在string.I中找到\u001这个字符并替换掉string.I努力解决还是无能为力
请解决这个问题。在此先感谢您的宝贵帮助。
在 C# 规范的深处,您可以找到以下内容:
[Note: The use of the \x hexadecimal-escape-sequence production can be
error-prone and hard to read due to the variable number of hexadecimal
digits following the \x. For example, in the code:
string good = "\x9Good text";
string bad = "\x9Bad text";
it might appear at first that the leading character is the same (U+0009, a tab character) in
both strings. In fact the second string starts with U+9BAD as all
three letters in the word "Bad" are valid hexadecimal digits. As a
matter of style, it is recommended that \x is avoided in favour of
either specific escape sequences (\t in this example) or the
fixed-length \u escape sequence. end note]
还有:
unicode-escape-sequence::
\u hex-digit hex-digit hex-digit hex-digit
\U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit
hex-digit hex-digit
为了进一步简化,\u 后跟 4 或 8 个十六进制符号 - 不是 3。您的字符串被解释为 "our guests will experience \u001favor in an area".
如果我们查看 C# 语言规范 ECMA-334,在第 7.4.2 节“Unicode 字符转义序列”中,我们会发现
A Unicode escape sequence represents a Unicode code point. Unicode escape sequences are processed in identifiers (§7.4.3), character literals (§7.4.5.5), and regular string literals (§7.4.5.6). A Unicode escape sequence is not processed in any other location (for example, to form an operator, punctuator, or keyword).
unicode-escape-sequence:: \u hex-digit hex-digit hex-digit hex-digit
\U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit
因此您必须使用 四个 十六进制数字和 \u
。
在您的示例中,它采用“001f”作为这四个十六进制数字。
您示例中的 "\u001"
应该按照 "Unrecognized escape sequence."
在 Visual Studio 中给出错误
使用正则表达式:
var unicodeRegexp = new Regex(@"\x1f");
var testWord = "our guests will experience \u001favor in an area";
var newWord = unicodeRegexp.Replace(testWord, "text for replacement");
\x1f 是 \uoo1f 的替代品,应跳过前导零
https://www.regular-expressions.info/unicode.html#codepoint
string str = "our guests will experience \u001favor in an area";
bool exists = str.IndexOf("\u001", StringComparison.CurrentCultureIgnoreCase) > -1;
想在string.I中找到\u001这个字符并替换掉string.I努力解决还是无能为力
请解决这个问题。在此先感谢您的宝贵帮助。
在 C# 规范的深处,您可以找到以下内容:
[Note: The use of the \x hexadecimal-escape-sequence production can be error-prone and hard to read due to the variable number of hexadecimal digits following the \x. For example, in the code:
string good = "\x9Good text";
string bad = "\x9Bad text";
it might appear at first that the leading character is the same (U+0009, a tab character) in both strings. In fact the second string starts with U+9BAD as all three letters in the word "Bad" are valid hexadecimal digits. As a matter of style, it is recommended that \x is avoided in favour of either specific escape sequences (\t in this example) or the fixed-length \u escape sequence. end note]
还有:
unicode-escape-sequence::
\u hex-digit hex-digit hex-digit hex-digit
\U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit
为了进一步简化,\u 后跟 4 或 8 个十六进制符号 - 不是 3。您的字符串被解释为 "our guests will experience \u001favor in an area".
如果我们查看 C# 语言规范 ECMA-334,在第 7.4.2 节“Unicode 字符转义序列”中,我们会发现
A Unicode escape sequence represents a Unicode code point. Unicode escape sequences are processed in identifiers (§7.4.3), character literals (§7.4.5.5), and regular string literals (§7.4.5.6). A Unicode escape sequence is not processed in any other location (for example, to form an operator, punctuator, or keyword).
unicode-escape-sequence:: \u hex-digit hex-digit hex-digit hex-digit
\U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit
因此您必须使用 四个 十六进制数字和 \u
。
在您的示例中,它采用“001f”作为这四个十六进制数字。
您示例中的 "\u001"
应该按照 "Unrecognized escape sequence."
使用正则表达式:
var unicodeRegexp = new Regex(@"\x1f");
var testWord = "our guests will experience \u001favor in an area";
var newWord = unicodeRegexp.Replace(testWord, "text for replacement");
\x1f 是 \uoo1f 的替代品,应跳过前导零 https://www.regular-expressions.info/unicode.html#codepoint