Google protobuf 格式如何在编码后减小对象的大小

Question

package sample;

import java.util.ArrayList;
import java.util.List;

import org.apache.commons.lang.SerializationUtils;

import sample.ProtoObj.Attachment;

public class Main {

    public static void main(String args[]){
        POJO pojo = new POJO();
        pojo.setContent("content");
        List<sample.POJO.Attachment> att = new ArrayList<POJO.Attachment>();
        sample.POJO.Attachment attach = pojo.new Attachment();
         attach.setName("Attachment Name");
         attach.setId("0e068652dbd9");
         attach.setSize(1913558);
         att.add(attach);
         pojo.setAttach(att);
         byte[] byyy = SerializationUtils.serialize(pojo);
         System.out.println("Size of the POJO ::: "+byyy.length);

         ProtoObj tc = new ProtoObj();
         List<Attachment> attachList = new ArrayList<ProtoObj.Attachment>();
         Attachment attach1 = tc.new Attachment();
         attach1.setName("Attachment Name");
         attach1.setId("0e068652dbd9");
         attach1.setSize(1913558);
         attachList.add(attach1);
         tc.setContent("content");
         tc.setAttach(attachList);

         byte[] bhh = tc.getProto(tc);

         System.out.println("Size of the PROTO ::: "+bhh.length);

    }

}

我已经使用上面的程序使用 Protobuf 和 POJO 计算了 encoded/Serialized 对象的大小。这两个对象都处理同一组数据。但是输出显示对象的大小存在巨大差异。

输出：

Size of the POJO ::: 336
Size of the PROTO ::: 82

我还阅读了下面的内容 link 以了解 google protobuf 格式如何影响编码对象的大小。

https://developers.google.com/protocol-buffers/docs/encoding

但是我无法理解。请简单解释一下我就明白了。

Answer 1

Protobuf 不会将架构与数据一起发送。所以双方都需要有模式才能反序列化传递的数据。

正因为如此，您可以优化并将每个字段紧挨着放置。像这样：

AttachmentName0e068652dbd91913558

所有这些都是二进制格式。这在 JSON 中看起来像：

{"name": "AttachmentName", "id": "0e068652dbd9", "size": "1913558"}

如您所见，架构在序列化消息本身中进行了编码。

我不完全了解 Java SerialisationUtils，但我认为它们也传递或编码架构，这就是为什么您会看到这种大小差异。

Google protobuf 格式如何在编码后减小对象的大小

How Google protobuf format reduces size of the object after it encoded

java

protocol-buffers