比 BASE64 更紧凑的字节数组表示?

A more compact representation than BASE64 for byte arrays?

对于调试,我经常发现将字节数组(例如散列密码)可视化为 BASE64 字符串很有用。

        public override string ToString()
        {
            return Convert.ToBase64String(this.Hash);      
        }

但是对于大哈希值(比如超过 32 个字节),BASE64 编码会产生一个非常长的字符串。这使得仅通过查看它们很难快速比较它们。

BASE64 只使用 64 个可打印字符。我想知道是否有其他编码技术使用超过 64 个字符(但仍然只能打印字符)来减少表示 32 个字节所需的长度。在我看来,我们可以大大改进,因为在我的键盘上我已经看到 94 个易于区分的可打印键。

当然,让字节数组容易被人类比较并不是BASE64的初衷。但不管怎样,对吧? ;)

您可以使用 Ascii85。维基百科指出:

Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data (making the encoded size ¹⁄₄ larger than the original, assuming eight bits per ASCII character), it is more efficient than uuencode or Base64, which use four characters to represent three bytes of data (¹⁄₃ increase, assuming eight bits per ASCII character).

您将在 github which is written by Jeff Atwood and he accompanied that code with a post on his blog

上找到一个 c# 实现

由于您只需要编码器部分,我使用 Jeff 的代码作为开始并创建了一个仅包含编码器部分的实现:

class Ascii85
{

    private const int _asciiOffset = 33;
    private const int decodedBlockLength = 4;

    private byte[] _encodedBlock = new byte[5];
    private uint _tuple;

    /// <summary>
    /// Encodes binary data into a plaintext ASCII85 format string
    /// </summary>
    /// <param name="ba">binary data to encode</param>
    /// <returns>ASCII85 encoded string</returns>
    public string Encode(byte[] ba)
    {
        StringBuilder sb = new StringBuilder((int)(ba.Length * (_encodedBlock.Length / decodedBlockLength)));

        int count = 0;
        _tuple = 0;
        foreach (byte b in ba)
        {
            if (count >= decodedBlockLength - 1)
            {
                _tuple |= b;
                if (_tuple == 0)
                {
                    sb.Append('z');
                }
                else
                {
                    EncodeBlock(_encodedBlock.Length, sb);
                }
                _tuple = 0;
                count = 0;
            }
            else
            {
                _tuple |= (uint)(b << (24 - (count * 8)));
                count++;
            }
        }

        // if we have some bytes left over at the end..
        if (count > 0)
        {
            EncodeBlock(count + 1, sb);
        }

        return sb.ToString();
    }

    private void EncodeBlock(int count, StringBuilder sb)
    {
        for (int i = _encodedBlock.Length - 1; i >= 0; i--)
        {
            _encodedBlock[i] = (byte)((_tuple % 85) + _asciiOffset);
            _tuple /= 85;
        }

        for (int i = 0; i < count; i++)
        {
            sb.Append((char)_encodedBlock[i]);
        }

    }
}

这是必需的属性:

/// <summary>
/// adapted from the Jeff Atwood code to only have the encoder
/// 
/// C# implementation of ASCII85 encoding. 
/// Based on C code from http://www.stillhq.com/cgi-bin/cvsweb/ascii85/
/// </summary>
/// <remarks>
/// Jeff Atwood
/// http://www.codinghorror.com/blog/archives/000410.html
/// </remarks>

展示 base85 比 base64 更紧凑:

在 python

中将 base64 转换为 base85
import base64
a = "cDINXkoEWkwPIZMJNMyblaL6RY4/8W7edopyZkqop6I="
b = base64.b64decode(a)
c = base64.b85encode(b)
c == b'a54>EN(5R=4<VBYG|ZcoqWVRSKk;tfc8YRlN~ouz'

并排:

cDINXkoEWkwPIZMJNMyblaL6RY4/8W7edopyZkqop6I=  base64: 44 chars
a54>EN(5R=4<VBYG|ZcoqWVRSKk;tfc8YRlN~ouz      base85: 40 chars

有问题吗? base85 比 base64