proto buffer MergeFrom的坑

pb好用, 性能虽然比不上二进制, 不过也比json在大多数情况下还是好很多. 尤其是字段的向前向后兼容, 这一点应该也是大多数同学选用pb的原因. 然后终于掉进坑了, 严重得影响了一次线上服务.
整理测试程序的时候发现, python的官方指南压根就没提MergeFrom这个接口, 看起来是专门用来坑c++的兄弟的啊

下面给个例子来说明

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# mergefrom_trap.proto
package mergefrom_trap;
message Person
{
required string name = 1;
required int32 age = 2;
// optional int32 new1 = 3;
// optional int32 new2 = 4[default = 0];
}
message AddressBook
{
required string host_name = 1;
repeated Person person = 2;
// optional int32 new3 = 3;
// optional int32 new4 = 4[default = 0];
}
1. 先用没有new1-4字段的proto, 编译出c++的bin文件
1
2
[carl pb_mergefrom_trap]$ protoc -I=./ --cpp_out=./ mergefrom_trap.proto
[carl pb_mergefrom_trap]$ g++ read_new_write_old.cpp mergefrom_trap.pb.cc -lprotobuf -o read_new_write_old.bin
2. 然后添家new1-4字段, 生成 mergefrom_trap_pb2.py
1
[carl pb_mergefrom_trap]$ protoc -I=./ --python_out=./ mergefrom_trap.proto
3. 用write_new.py程序创建一个address_book对象并写入文件, 输出结果如下:
1
2
3
4
5
6
7
8
9
10
[carl pb_mergefrom_trap]$ python write_new.py addressbook
address_book: host_name: "carl"
person {
name: "person1"
age: 30
new1: 1
new2: 2
}
new3: 3
new4: 4
4. 执行上面c++的bin文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
[carl pb_mergefrom_trap]$ ./read_new_write_old.bin addressbook
book: host_name: "carl"
person {
name: "person1"
age: 30
3: 1
4: 2
}
3: 3
4: 4
book: host_name: "carl_copyfrom_update"
person {
name: "person1"
age: 100
3: 1
4: 2
}
3: 3
4: 4
book: host_name: "carl_copyfrom_update"
person {
name: "person1"
age: 100
3: 1
4: 2
}
3: 3
4: 4
book: host_name: "carl_copyfrom_update"
person {
name: "person1"
age: 100
3: 1
4: 2
}
3: 3
4: 4
book: host_name: "carl_copyfrom_update"
person {
name: "person1"
age: 100
3: 1
4: 2
}
3: 3
4: 4
book: host_name: "carl_copyfrom_update"
person {
name: "person1"
age: 100
3: 1
4: 2
}
3: 3
4: 4
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
a big trap is coming ....
book: host_name: "carl_copyfrom_update"
person {
name: "person1"
age: 100
3: 1
4: 2
}
3: 3
4: 4
book: host_name: "carl_update_mergefrom"
person {
name: "person1"
age: 100
3: 1
4: 2
3: 1
4: 2
}
3: 3
4: 4
book: host_name: "carl_update_mergefrom"
person {
name: "person1"
age: 100
3: 1
4: 2
3: 1
4: 2
}
3: 3
4: 4
book: host_name: "carl_update_mergefrom"
person {
name: "person1"
age: 100
3: 1
4: 2
3: 1
4: 2
3: 1
4: 2
3: 1
4: 2
}
3: 3
4: 4
book: host_name: "carl_update_mergefrom"
person {
name: "person1"
age: 100
3: 1
4: 2
3: 1
4: 2
3: 1
4: 2
3: 1
4: 2
}
3: 3
4: 4
book: host_name: "carl_update_mergefrom"
person {
name: "person1"
age: 100
3: 1
4: 2
3: 1
4: 2
3: 1
4: 2
3: 1
4: 2
3: 1
4: 2
3: 1
4: 2
3: 1
4: 2
3: 1
4: 2
}
3: 3
4: 4

总结

由上测试结果可以看出, 一个使用了旧proto的c++程序, 在调用mergefrom的时候, 那2个unknown的字段不是覆盖回原来已有的字段上, 而是新增了. 连续调用几次, 这个是指数级增长的. 这会导致serialize出来的bin文件大小指数级增长, 然后会影响到进程, 在Parse和Serialize的时候, 函数调用会阻塞几十秒甚至几分钟, 网络流量也会剧增, 这对一个生产服务器可以说是致命的, 血淋淋的教训啊.
以上文件托管在
github/demo/tree/master/testcode/pb_mergefrom_trap