比 BASE64 更紧凑的字节数组表示?
A more compact representation than BASE64 for byte arrays?
对于调试,我经常发现将字节数组(例如散列密码)可视化为 BASE64 字符串很有用。
public override string ToString()
{
return Convert.ToBase64String(this.Hash);
}
但是对于大哈希值(比如超过 32 个字节),BASE64 编码会产生一个非常长的字符串。这使得仅通过查看它们很难快速比较它们。
BASE64 只使用 64 个可打印字符。我想知道是否有其他编码技术使用超过 64 个字符(但仍然只能打印字符)来减少表示 32 个字节所需的长度。在我看来,我们可以大大改进,因为在我的键盘上我已经看到 94 个易于区分的可打印键。
当然,让字节数组容易被人类比较并不是BASE64的初衷。但不管怎样,对吧? ;)
您可以使用 Ascii85。维基百科指出:
Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data (making the encoded size ¹⁄₄ larger than the original, assuming eight bits per ASCII character), it is more efficient than uuencode or Base64, which use four characters to represent three bytes of data (¹⁄₃ increase, assuming eight bits per ASCII character).
您将在 github which is written by Jeff Atwood and he accompanied that code with a post on his blog
上找到一个 c# 实现
由于您只需要编码器部分,我使用 Jeff 的代码作为开始并创建了一个仅包含编码器部分的实现:
class Ascii85
{
private const int _asciiOffset = 33;
private const int decodedBlockLength = 4;
private byte[] _encodedBlock = new byte[5];
private uint _tuple;
/// <summary>
/// Encodes binary data into a plaintext ASCII85 format string
/// </summary>
/// <param name="ba">binary data to encode</param>
/// <returns>ASCII85 encoded string</returns>
public string Encode(byte[] ba)
{
StringBuilder sb = new StringBuilder((int)(ba.Length * (_encodedBlock.Length / decodedBlockLength)));
int count = 0;
_tuple = 0;
foreach (byte b in ba)
{
if (count >= decodedBlockLength - 1)
{
_tuple |= b;
if (_tuple == 0)
{
sb.Append('z');
}
else
{
EncodeBlock(_encodedBlock.Length, sb);
}
_tuple = 0;
count = 0;
}
else
{
_tuple |= (uint)(b << (24 - (count * 8)));
count++;
}
}
// if we have some bytes left over at the end..
if (count > 0)
{
EncodeBlock(count + 1, sb);
}
return sb.ToString();
}
private void EncodeBlock(int count, StringBuilder sb)
{
for (int i = _encodedBlock.Length - 1; i >= 0; i--)
{
_encodedBlock[i] = (byte)((_tuple % 85) + _asciiOffset);
_tuple /= 85;
}
for (int i = 0; i < count; i++)
{
sb.Append((char)_encodedBlock[i]);
}
}
}
这是必需的属性:
/// <summary>
/// adapted from the Jeff Atwood code to only have the encoder
///
/// C# implementation of ASCII85 encoding.
/// Based on C code from http://www.stillhq.com/cgi-bin/cvsweb/ascii85/
/// </summary>
/// <remarks>
/// Jeff Atwood
/// http://www.codinghorror.com/blog/archives/000410.html
/// </remarks>
展示 base85 比 base64 更紧凑:
在 python
中将 base64 转换为 base85
import base64
a = "cDINXkoEWkwPIZMJNMyblaL6RY4/8W7edopyZkqop6I="
b = base64.b64decode(a)
c = base64.b85encode(b)
c == b'a54>EN(5R=4<VBYG|ZcoqWVRSKk;tfc8YRlN~ouz'
并排:
cDINXkoEWkwPIZMJNMyblaL6RY4/8W7edopyZkqop6I= base64: 44 chars
a54>EN(5R=4<VBYG|ZcoqWVRSKk;tfc8YRlN~ouz base85: 40 chars
有问题吗? base85 比 base64
慢
对于调试,我经常发现将字节数组(例如散列密码)可视化为 BASE64 字符串很有用。
public override string ToString()
{
return Convert.ToBase64String(this.Hash);
}
但是对于大哈希值(比如超过 32 个字节),BASE64 编码会产生一个非常长的字符串。这使得仅通过查看它们很难快速比较它们。
BASE64 只使用 64 个可打印字符。我想知道是否有其他编码技术使用超过 64 个字符(但仍然只能打印字符)来减少表示 32 个字节所需的长度。在我看来,我们可以大大改进,因为在我的键盘上我已经看到 94 个易于区分的可打印键。
当然,让字节数组容易被人类比较并不是BASE64的初衷。但不管怎样,对吧? ;)
您可以使用 Ascii85。维基百科指出:
Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data (making the encoded size ¹⁄₄ larger than the original, assuming eight bits per ASCII character), it is more efficient than uuencode or Base64, which use four characters to represent three bytes of data (¹⁄₃ increase, assuming eight bits per ASCII character).
您将在 github which is written by Jeff Atwood and he accompanied that code with a post on his blog
上找到一个 c# 实现由于您只需要编码器部分,我使用 Jeff 的代码作为开始并创建了一个仅包含编码器部分的实现:
class Ascii85
{
private const int _asciiOffset = 33;
private const int decodedBlockLength = 4;
private byte[] _encodedBlock = new byte[5];
private uint _tuple;
/// <summary>
/// Encodes binary data into a plaintext ASCII85 format string
/// </summary>
/// <param name="ba">binary data to encode</param>
/// <returns>ASCII85 encoded string</returns>
public string Encode(byte[] ba)
{
StringBuilder sb = new StringBuilder((int)(ba.Length * (_encodedBlock.Length / decodedBlockLength)));
int count = 0;
_tuple = 0;
foreach (byte b in ba)
{
if (count >= decodedBlockLength - 1)
{
_tuple |= b;
if (_tuple == 0)
{
sb.Append('z');
}
else
{
EncodeBlock(_encodedBlock.Length, sb);
}
_tuple = 0;
count = 0;
}
else
{
_tuple |= (uint)(b << (24 - (count * 8)));
count++;
}
}
// if we have some bytes left over at the end..
if (count > 0)
{
EncodeBlock(count + 1, sb);
}
return sb.ToString();
}
private void EncodeBlock(int count, StringBuilder sb)
{
for (int i = _encodedBlock.Length - 1; i >= 0; i--)
{
_encodedBlock[i] = (byte)((_tuple % 85) + _asciiOffset);
_tuple /= 85;
}
for (int i = 0; i < count; i++)
{
sb.Append((char)_encodedBlock[i]);
}
}
}
这是必需的属性:
/// <summary>
/// adapted from the Jeff Atwood code to only have the encoder
///
/// C# implementation of ASCII85 encoding.
/// Based on C code from http://www.stillhq.com/cgi-bin/cvsweb/ascii85/
/// </summary>
/// <remarks>
/// Jeff Atwood
/// http://www.codinghorror.com/blog/archives/000410.html
/// </remarks>
展示 base85 比 base64 更紧凑:
在 python
中将 base64 转换为 base85import base64
a = "cDINXkoEWkwPIZMJNMyblaL6RY4/8W7edopyZkqop6I="
b = base64.b64decode(a)
c = base64.b85encode(b)
c == b'a54>EN(5R=4<VBYG|ZcoqWVRSKk;tfc8YRlN~ouz'
并排:
cDINXkoEWkwPIZMJNMyblaL6RY4/8W7edopyZkqop6I= base64: 44 chars
a54>EN(5R=4<VBYG|ZcoqWVRSKk;tfc8YRlN~ouz base85: 40 chars
有问题吗? base85 比 base64
慢