pb好用, 性能虽然比不上二进制, 不过也比json在大多数情况下还是好很多. 尤其是字段的向前向后兼容, 这一点应该也是大多数同学选用pb的原因. 然后终于掉进坑了, 严重得影响了一次线上服务.
整理测试程序的时候发现, python的官方指南压根就没提MergeFrom这个接口, 看起来是专门用来坑c++的兄弟的啊
下面给个例子来说明
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| package mergefrom_trap; message Person { required string name = 1; required int32 age = 2; // optional int32 new1 = 3; // optional int32 new2 = 4[default = 0]; } message AddressBook { required string host_name = 1; repeated Person person = 2; // optional int32 new3 = 3; // optional int32 new4 = 4[default = 0]; }
|
1. 先用没有new1-4字段的proto, 编译出c++的bin文件
1 2
| [carl pb_mergefrom_trap]$ protoc -I=./ --cpp_out=./ mergefrom_trap.proto [carl pb_mergefrom_trap]$ g++ read_new_write_old.cpp mergefrom_trap.pb.cc -lprotobuf -o read_new_write_old.bin
|
2. 然后添家new1-4字段, 生成 mergefrom_trap_pb2.py
1
| [carl pb_mergefrom_trap]$ protoc -I=./ --python_out=./ mergefrom_trap.proto
|
3. 用write_new.py程序创建一个address_book对象并写入文件, 输出结果如下:
1 2 3 4 5 6 7 8 9 10
| [carl pb_mergefrom_trap]$ python write_new.py addressbook address_book: host_name: "carl" person { name: "person1" age: 30 new1: 1 new2: 2 } new3: 3 new4: 4
|
4. 执行上面c++的bin文件

| [carl pb_mergefrom_trap]$ ./read_new_write_old.bin addressbook book: host_name: "carl" person { name: "person1" age: 30 3: 1 4: 2 } 3: 3 4: 4 book: host_name: "carl_copyfrom_update" person { name: "person1" age: 100 3: 1 4: 2 } 3: 3 4: 4 book: host_name: "carl_copyfrom_update" person { name: "person1" age: 100 3: 1 4: 2 } 3: 3 4: 4 book: host_name: "carl_copyfrom_update" person { name: "person1" age: 100 3: 1 4: 2 } 3: 3 4: 4 book: host_name: "carl_copyfrom_update" person { name: "person1" age: 100 3: 1 4: 2 } 3: 3 4: 4 book: host_name: "carl_copyfrom_update" person { name: "person1" age: 100 3: 1 4: 2 } 3: 3 4: 4 a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... a big trap is coming .... book: host_name: "carl_copyfrom_update" person { name: "person1" age: 100 3: 1 4: 2 } 3: 3 4: 4 book: host_name: "carl_update_mergefrom" person { name: "person1" age: 100 3: 1 4: 2 3: 1 4: 2 } 3: 3 4: 4 book: host_name: "carl_update_mergefrom" person { name: "person1" age: 100 3: 1 4: 2 3: 1 4: 2 } 3: 3 4: 4 book: host_name: "carl_update_mergefrom" person { name: "person1" age: 100 3: 1 4: 2 3: 1 4: 2 3: 1 4: 2 3: 1 4: 2 } 3: 3 4: 4 book: host_name: "carl_update_mergefrom" person { name: "person1" age: 100 3: 1 4: 2 3: 1 4: 2 3: 1 4: 2 3: 1 4: 2 } 3: 3 4: 4 book: host_name: "carl_update_mergefrom" person { name: "person1" age: 100 3: 1 4: 2 3: 1 4: 2 3: 1 4: 2 3: 1 4: 2 3: 1 4: 2 3: 1 4: 2 3: 1 4: 2 3: 1 4: 2 } 3: 3 4: 4
|
总结
由上测试结果可以看出, 一个使用了旧proto的c++程序, 在调用mergefrom的时候, 那2个unknown的字段不是覆盖回原来已有的字段上, 而是新增了. 连续调用几次, 这个是指数级增长的. 这会导致serialize出来的bin文件大小指数级增长, 然后会影响到进程, 在Parse和Serialize的时候, 函数调用会阻塞几十秒甚至几分钟, 网络流量也会剧增, 这对一个生产服务器可以说是致命的, 血淋淋的教训啊.
以上文件托管在
github/demo/tree/master/testcode/pb_mergefrom_trap