protobuf如何判断一个值是属于可选字段，还是属于另一个对象？

Question

例如，如果我这样定义一张照片：

$cat 30.proto
message hello
{
    required int32 f1=1;
    required int32 f2=2;
    optional int32 f3=3;
}

如果 protobuf 可以处理这样的事情，我会加倍：

我声明了 3 个对象，每个对象都没有 f3 字段。
写入输出
那么在reader这边，reader怎么知道这6个值应该属于3个对象（每2个字段），还是属于2个对象（每个3个字段）？

换句话说，"require"/"optional"是如何体现在编码字节里面的？如果没有体现在字节流中，那么protobuf是如何确定一个新偏移量的开始呢？我们知道 protobuf 没有 "delimiter" 位。

我对此进行了简单的快速测试：

$cat 30.cpp
#include "30.pb.h"
#include<fstream>
using namespace std;
int main()
{
    fstream f("./log30.data",ios::binary|ios::out);
    hello p1,p2,p3,p4,p5;
    p1.set_f1(1);
    p1.set_f2(2);
    p2.set_f1(3);
    p2.set_f2(4);
    p3.set_f1(5);
    p3.set_f2(6);
    p1.SerializeToOstream(&f);
    p2.SerializeToOstream(&f);
    p3.SerializeToOstream(&f);

    p4.set_f1(7);
    p4.set_f2(8);
    p4.set_f3(9);
    p5.set_f1(0xa);
    p5.set_f2(0xb);
    p5.set_f3(0xc);
    p4.SerializeToOstream(&f);
    p5.SerializeToOstream(&f);
    return 0;
}

$g++ 30.cpp 30.pb.cc -lprotobuf && ./a.out && xxd log30.data
00000000: 0801 1002 0803 1004 0805 1006 0807 1008  ................
00000010: 1809 080a 100b 180c                      ........

我只是猜测字节流是否总是以最小的标记号开始，并随着转储字节流而增加：当遇到较小的标记号时，它认为这是一个新对象的开始。只是我的粗略猜测。

需要你的解释！

Answer 1

(3) Then, in reader side, how does reader know that these 6 values should belong to 3 objects(each 2 fields), or belong to 2 objects(each 3 fields)?

In another word, how does the "require"/"optional" reflected inside encoded bytes? If not reflected in the byte stream, then how does protobuf determine the start of a new offset? We know protobuf don't have "delimiter" bits.

Protobuf 没有。在将消息提供给 protobuf 之前拆分消息取决于您，程序员。

例如，运行这个程序：

#include "30.pb.h"
#include <fstream>
#include <iostream>
using namespace std;
int main()
{
    fstream f("./log30.data",ios::binary|ios::out);
    hello p1,p2,p3,p4,p5;
    p1.set_f1(1);
    p1.set_f2(2);
    p2.set_f1(3);
    p2.set_f2(4);
    p3.set_f1(5);
    p3.set_f2(6);
    p1.SerializeToOstream(&f);
    p2.SerializeToOstream(&f);
    p3.SerializeToOstream(&f);

    p4.set_f1(7);
    p4.set_f2(8);
    p4.set_f3(9);
    p5.set_f1(0xa);
    p5.set_f2(0xb);
    p5.set_f3(0xc);
    p4.SerializeToOstream(&f);
    p5.SerializeToOstream(&f);
    f.close();
    f.open("./log30.data", ios::binary|ios::in);

    hello hin;
    hin.ParseFromIstream(&f);

    cout << "f1: " << hin.f1() << ", f2: " << hin.f2() << ", f3: " << hin.f3() << "\n";
    return 0;
}

您应该只看到最后一个序列化的 hello 对象的值，因为 protobuf 读取整个流并用新值覆盖旧值。

Answer 2

形成documentation

As you know, a protocol buffer message is a series of key-value pairs. The binary version of a message just uses the field's number as the key – the name and declared type for each field can only be determined on the decoding end by referencing the message type's definition (i.e. the .proto file).

When a message is encoded, the keys and values are concatenated into a byte stream. When the message is being decoded, the parser needs to be able to skip fields that it doesn't recognize. This way, new fields can be added to a message without breaking old programs that do not know about them. To this end, the "key" for each pair in a wire-format message is actually two values – the field number from your .proto file, plus a wire type that provides just enough information to find the length of the following value.

...

If a proto2 message definition has repeated elements (without the [packed=true] option), the encoded message has zero or more key-value pairs with the same tag number.

因此无法将可选元素放入输出流中。虽然必须包括必需的。序列化和反序列化都必须知道架构（与 Avro where schema must be embedded with data 相反），因此当解析器检查所有必需字段是否都有值时，required/optional 字段的验证发生在反序列化之后。

protobuf如何判断一个值是属于可选字段，还是属于另一个对象？

How does protobuf judge if a value belongs to an optional field, or another object?

linux

encode

object

delimiter

protocol-buffers