如何按字节 trim 一个字符串?
How to trim a String by bytes?
我有一个 UTF-8 文本,我想要 trim/按字节截断它,以便我得到
字节长度的新字符串。
public static String trimByBytes(String text, int longitudBytes) throws Exception {
byte bytes_text[] = text.getBytes("UTF-8");
int negativeBytes = 0;
byte byte_trimmed[] = new byte[longitudBytes];
if (byte_trimmed.length <= bytes_text.length) {
//copy array manually and count negativeBytes
for (int i = 0; i < byte_trimmed.length; i++) {
byte_trimmed[i] = bytes_text[i];
if (byte_trimmed[i] < 0) {
negativeBytes++;
}
}
//if negativeBytes are odd
if (negativeBytes % 2 != 0 && byte_trimmed[byte_trimmed.length - 1] < 0) {
byte_trimmed[byte_trimmed.length - 1] = 0;//delete last
}
}else{
for (int i = 0; i < bytes_text.length; i++) {
byte_trimmed[i] = bytes_text[i];
}
}
return new String(byte_trimmed);
}
}
例如
- 命名法:String trimByBytes(String str, int lengthOfBytes);
trimByBytes(戈麦斯 ,1)
- Gómez 的长度为 6 个字节(但长度为 5 个字符)
- Gómez trimmed at 3 是 Gó ok Gómez trimmed at 2 是 G� 但我想要 G
(去除奇数字符)
- Gómez trimmed 1 是 G ok Gómez trimmed 8 是 G Gómez
在其上创建显式 CharsetDecoder, and set CodingErrorAction.IGNORE。
由于 CharsetDecoder 与 ByteBuffers 一起工作,应用长度限制就像调用 ByteBuffer 的 limit 方法一样简单:
String trimByBytes(String str, int lengthOfBytes) {
byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
ByteBuffer buffer = ByteBuffer.wrap(bytes);
if (lengthOfBytes < buffer.limit()) {
buffer.limit(lengthOfBytes);
}
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
try {
return decoder.decode(buffer).toString();
} catch (CharacterCodingException e) {
// We will never get here.
throw new RuntimeException(e);
}
}
我有一个 UTF-8 文本,我想要 trim/按字节截断它,以便我得到 字节长度的新字符串。
public static String trimByBytes(String text, int longitudBytes) throws Exception {
byte bytes_text[] = text.getBytes("UTF-8");
int negativeBytes = 0;
byte byte_trimmed[] = new byte[longitudBytes];
if (byte_trimmed.length <= bytes_text.length) {
//copy array manually and count negativeBytes
for (int i = 0; i < byte_trimmed.length; i++) {
byte_trimmed[i] = bytes_text[i];
if (byte_trimmed[i] < 0) {
negativeBytes++;
}
}
//if negativeBytes are odd
if (negativeBytes % 2 != 0 && byte_trimmed[byte_trimmed.length - 1] < 0) {
byte_trimmed[byte_trimmed.length - 1] = 0;//delete last
}
}else{
for (int i = 0; i < bytes_text.length; i++) {
byte_trimmed[i] = bytes_text[i];
}
}
return new String(byte_trimmed);
}
}
例如
- 命名法:String trimByBytes(String str, int lengthOfBytes); trimByBytes(戈麦斯 ,1)
- Gómez 的长度为 6 个字节(但长度为 5 个字符)
- Gómez trimmed at 3 是 Gó ok Gómez trimmed at 2 是 G� 但我想要 G (去除奇数字符)
- Gómez trimmed 1 是 G ok Gómez trimmed 8 是 G Gómez
在其上创建显式 CharsetDecoder, and set CodingErrorAction.IGNORE。
由于 CharsetDecoder 与 ByteBuffers 一起工作,应用长度限制就像调用 ByteBuffer 的 limit 方法一样简单:
String trimByBytes(String str, int lengthOfBytes) {
byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
ByteBuffer buffer = ByteBuffer.wrap(bytes);
if (lengthOfBytes < buffer.limit()) {
buffer.limit(lengthOfBytes);
}
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
try {
return decoder.decode(buffer).toString();
} catch (CharacterCodingException e) {
// We will never get here.
throw new RuntimeException(e);
}
}