Protobuf Reflection
I talked about types in Type system and language bindings. Today I go deep into one specific type system: Google Protocol Buffers(GPB). GPB support introspection, reflection, dynamic type, dynamic data. However, I do not consider GPB a true dynamic typing system and I will talk about why after we have a deep look at how GPB works.
GPB Workflow
The main workflow starts from proto definition files and protoc compiler will compile them into C++ types, which user will use directly. The process can be illustrated like following:
The main design pattern of GPB is static generation plus type erasure in C++. All user-defined specific types inherit from the Message base type. All user-defined type information, like field name, field type, message name, field offsets are statically generated and stored in protoc-generated C++ files. During runtime, before main, those information will be registered into global DescriptorPool.
How reflection work
The message creation is based on type erasure. Member access is based on static offset information and type descriptor information. Dynamic type creation is based on DynamicMessage type.
When user calls MessageFactory::generated_factory()->GetPrototype(descriptor)->New()
, GPB returns a Message type. Underneath this virtual type, there are two possibilities:
- The protoc-generated C++ type, which is specific C++ type, and the returned Message type can be cast to it dynamically.
- DynamicMessage type, which is constructed at runtime by protobuf library.
Code demo
Here I did a demonstration about protocol buffers advanced usage, the whole code can be found at protobuf reflection code demo.
#include "my_message.pb.h" // Generated by protoc
#include <absl/strings/string_view.h>
#include <google/protobuf/compiler/importer.h>
#include <google/protobuf/compiler/parser.h>
#include <google/protobuf/descriptor.pb.h>
#include <google/protobuf/dynamic_message.h>
#include <google/protobuf/text_format.h>
#include <iomanip>
#include <iostream>
#include <memory>
using namespace std;
// Custom SourceTree for in-memory .proto files
class MemorySourceTree : public google::protobuf::compiler::SourceTree {
public:
MemorySourceTree(const std::string &name, const std::string &content)
: filename_(name), content_(content) {}
google::protobuf::io::ZeroCopyInputStream *
Open(absl::string_view filename) override {
if (filename != filename_)
return nullptr;
return new google::protobuf::io::ArrayInputStream(
content_.data(), static_cast<int>(content_.size()));
}
private:
std::string filename_;
std::string content_;
};
int main() {
GOOGLE_PROTOBUF_VERIFY_VERSION;
// ============================================
// Section 1: Static Message Creation
// Demonstrates creating a message using the generated C++ class
// This is the most common and straightforward way to use protobuf
// ============================================
test::MyMessage static_msg;
static_msg.set_id(123);
static_msg.set_name("Static Message");
cout << "\n=== Section 1: Static Message Creation ===\n";
cout << "----------------------------------------\n";
cout << "Message Content:\n" << static_msg.DebugString();
cout << "Type: " << static_msg.GetTypeName() << "\n\n";
// ============================================
// Section 2: Dynamic Message Creation
// Shows how to create a message using the descriptor and reflection APIs
// This is useful when you don't have the generated C++ class at compile time
// ============================================
const google::protobuf::Descriptor *descriptor =
google::protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName(
"test.MyMessage");
if (!descriptor) {
cerr << "Descriptor not found!" << endl;
return 1;
}
google::protobuf::Message *dynamic_msg =
google::protobuf::MessageFactory::generated_factory()
->GetPrototype(descriptor)
->New();
// ============================================
// Section 3: Type Identity Verification
// Proves that static and dynamic messages are of the same type
// Demonstrates that dynamic messages can be cast to their static counterparts
// ============================================
cout << "\n=== Section 3: Type Identity Verification ===\n";
cout << "-------------------------------------------\n";
cout << "Are descriptors identical? "
<< (descriptor == test::MyMessage::descriptor() ? "YES" : "NO") << "\n";
test::MyMessage *converted_msg = dynamic_cast<test::MyMessage *>(dynamic_msg);
cout << "Dynamic cast successful? " << (converted_msg ? "YES" : "NO")
<< "\n\n";
// ============================================
// Section 4: Memory Compatibility Test
// Shows that dynamic messages can be modified and accessed just like static
// ones Demonstrates the memory layout compatibility between static and
// dynamic messages
// ============================================
cout << "\n=== Section 4: Memory Compatibility Test ===\n";
cout << "------------------------------------------\n";
// Set the same values in dynamic_msg as in static_msg
const google::protobuf::Reflection *compat_reflection =
dynamic_msg->GetReflection();
const google::protobuf::Descriptor *compat_desc =
dynamic_msg->GetDescriptor();
const google::protobuf::FieldDescriptor *compat_id_field =
compat_desc->FindFieldByName("id");
const google::protobuf::FieldDescriptor *compat_name_field =
compat_desc->FindFieldByName("name");
compat_reflection->SetInt32(dynamic_msg, compat_id_field, static_msg.id());
compat_reflection->SetString(dynamic_msg, compat_name_field,
static_msg.name());
cout << "Static message content:\n" << static_msg.DebugString() << "\n";
cout << "Dynamic message content:\n" << dynamic_msg->DebugString() << "\n";
// Compare serialized values
string compat_static_serialized, compat_dynamic_serialized;
static_msg.SerializeToString(&compat_static_serialized);
dynamic_msg->SerializeToString(&compat_dynamic_serialized);
cout << "Serialized values identical? "
<< (compat_static_serialized == compat_dynamic_serialized ? "YES" : "NO")
<< "\n";
if (compat_static_serialized != compat_dynamic_serialized) {
cout << "Static serialized size: " << compat_static_serialized.size()
<< "\n";
cout << "Dynamic serialized size: " << compat_dynamic_serialized.size()
<< "\n";
}
cout << "\n";
// ============================================
// Section 5: Basic Reflection API Usage
// Demonstrates how to use the reflection API to access message fields
// Shows the basic operations for getting and setting field values
// ============================================
test::MyMessage message;
const google::protobuf::Reflection *reflection = message.GetReflection();
const google::protobuf::Descriptor *descriptor_message =
message.GetDescriptor();
const google::protobuf::FieldDescriptor *id_field =
descriptor_message->FindFieldByName("id");
int32_t id_value = reflection->GetInt32(message, id_field);
reflection->SetInt32(&message, id_field, 42);
cout << "\n=== Section 5: Basic Reflection API Usage ===\n";
cout << "-------------------------------------------\n";
cout << "Message ID after reflection: " << message.id() << "\n\n";
delete dynamic_msg; // Must manage dynamic allocation
google::protobuf::ShutdownProtobufLibrary();
// ============================================
// Section 6: Dynamic Message Type Creation
// Shows how to create a new message type programmatically
// Useful for creating message types at runtime without .proto files
// ============================================
cout << "\n=== Section 6: Dynamic Message Type Creation ===\n";
cout << "----------------------------------------------\n";
google::protobuf::DescriptorPool pool(
google::protobuf::DescriptorPool::generated_pool());
google::protobuf::FileDescriptorProto file_proto;
file_proto.set_name("my_dynamic.proto");
file_proto.set_package("mypackage");
google::protobuf::DescriptorProto *message_proto =
file_proto.add_message_type();
message_proto->set_name("MyDynamicMessage");
google::protobuf::FieldDescriptorProto *field = message_proto->add_field();
field->set_name("my_field");
field->set_number(1);
field->set_type(google::protobuf::FieldDescriptorProto::TYPE_STRING);
const google::protobuf::FileDescriptor *file_desc =
pool.BuildFile(file_proto);
if (!file_desc) {
std::cerr << "Failed to build file descriptor!" << std::endl;
return 1;
}
const google::protobuf::Descriptor *message_desc = file_desc->message_type(0);
if (!message_desc) {
std::cerr << "Failed to get message descriptor!" << std::endl;
return 1;
}
google::protobuf::DynamicMessageFactory factory;
google::protobuf::Message *message_dyn =
factory.GetPrototype(message_desc)->New();
const google::protobuf::Reflection *dyn_reflection =
message_dyn->GetReflection();
const google::protobuf::FieldDescriptor *field_desc =
message_desc->FindFieldByName("my_field");
dyn_reflection->SetString(message_dyn, field_desc,
"Hello from dynamic message!");
std::string value = dyn_reflection->GetString(*message_dyn, field_desc);
std::cout << "Dynamic message field value: " << value << std::endl;
// Without this, there will be seg fault; this is a because the underlyting
// derived type is not protoc generated c++ class type anymore, it's
// DynamicMessage type, the reason is unknown
google::protobuf::UnknownFieldSet *unknown =
message_dyn->GetReflection()->MutableUnknownFields(message_dyn);
if (unknown) {
unknown->AddVarint(999999, 0);
}
std::cout << message_dyn->DebugString() << std::endl;
// ============================================
// Section 6.5: Immutability of built descriptors
// Demonstrates that once a descriptor is built, it cannot be modified
// ============================================
cout << "\n=== Section 6.5: Immutability of built descriptors ===\n";
cout << "----------------------------------------------------\n";
// Try to add another field to message_proto
google::protobuf::FieldDescriptorProto *field2 = message_proto->add_field();
field2->set_name("another_field");
field2->set_number(2);
field2->set_type(google::protobuf::FieldDescriptorProto::TYPE_INT32);
cout << "Added 'another_field' to FileDescriptorProto in memory.\n";
// The existing message_desc is immutable and won't see the change.
const google::protobuf::FieldDescriptor *field_desc2 =
message_desc->FindFieldByName("another_field");
cout << "Is 'another_field' found in the original descriptor? "
<< (field_desc2 ? "YES" : "NO") << "\n";
cout << "Original descriptor field count: " << message_desc->field_count()
<< "\n\n";
// To use the new field, you would have to build a new FileDescriptor.
// Building with the same name will fail because it's already in the pool.
const google::protobuf::FileDescriptor *new_file_desc_fail =
pool.BuildFile(file_proto);
cout << "Trying to build with same name again: "
<< (new_file_desc_fail ? "succeeded (unexpected!)"
: "failed as expected")
<< "\n";
// We have to change the name to build a new version.
file_proto.set_name("my_dynamic_v2.proto");
// We must also change the message name to avoid a symbol collision in the
// pool.
message_proto->set_name("MyDynamicMessageV2");
const google::protobuf::FileDescriptor *new_file_desc_ok =
pool.BuildFile(file_proto);
if (new_file_desc_ok) {
cout << "Building with a new file and message name succeeded.\n";
const google::protobuf::Descriptor *new_message_desc =
new_file_desc_ok->FindMessageTypeByName("MyDynamicMessageV2");
if (new_message_desc) {
cout << "Found new message: '" << new_message_desc->full_name() << "'\n";
cout << "New descriptor field count: " << new_message_desc->field_count()
<< "\n";
const google::protobuf::FieldDescriptor *new_field_desc =
new_message_desc->FindFieldByName("another_field");
cout << "Is 'another_field' found in the new descriptor? "
<< (new_field_desc ? "YES" : "NO") << "\n";
} else {
cout << "Failed to find 'MyDynamicMessageV2' in new file "
"descriptor.\n";
}
} else {
cout << "Building with a new name failed unexpectedly.\n";
}
// ============================================
// Section 7: Proto File String Parsing
// Demonstrates how to create message types from a .proto file content string
// Shows how to use the compiler infrastructure to parse proto definitions
// ============================================
cout << "\n=== Section 7: Proto File String Parsing ===\n";
cout << "------------------------------------------\n";
std::string proto_content = R"(
syntax = "proto3";
package dynamic;
message DynamicPerson {
string name = 1;
int32 age = 2;
repeated string hobbies = 3;
bool is_active = 4;
}
)";
google::protobuf::DescriptorPool descriptor_pool(
google::protobuf::DescriptorPool::generated_pool());
MemorySourceTree source_tree("person.proto", proto_content);
google::protobuf::compiler::SourceTreeDescriptorDatabase source_tree_db(
&source_tree);
google::protobuf::FileDescriptorProto file_desc_proto;
if (!source_tree_db.FindFileByName("person.proto", &file_desc_proto)) {
std::cerr << "Failed to parse proto content!" << std::endl;
return 1;
}
const google::protobuf::FileDescriptor *file_desc_dyn =
descriptor_pool.BuildFile(file_desc_proto);
if (!file_desc_dyn) {
std::cerr << "Failed to build file descriptor!" << std::endl;
return 1;
}
const google::protobuf::Descriptor *descriptor_dyn =
file_desc_dyn->FindMessageTypeByName("DynamicPerson");
if (!descriptor_dyn) {
std::cerr << "Failed to find message descriptor!" << std::endl;
return 1;
}
google::protobuf::DynamicMessageFactory factory_dyn(&descriptor_pool);
const google::protobuf::Message *prototype =
factory_dyn.GetPrototype(descriptor_dyn);
if (!prototype) {
std::cerr << "Failed to get message prototype!" << std::endl;
return 1;
}
std::unique_ptr<google::protobuf::Message> message_dyn_(prototype->New());
if (!message_dyn_) {
std::cerr << "Failed to create dynamic message!" << std::endl;
return 1;
}
// Without this, there will be seg fault
google::protobuf::UnknownFieldSet *unknown_fields =
message_dyn_->GetReflection()->MutableUnknownFields(message_dyn_.get());
if (unknown_fields) {
unknown_fields->AddVarint(999999, 0);
}
const google::protobuf::Reflection *reflection_dyn =
message_dyn_->GetReflection();
if (!reflection_dyn) {
std::cerr << "Failed to get reflection interface!" << std::endl;
return 1;
}
const google::protobuf::FieldDescriptor *name_field =
descriptor_dyn->FindFieldByName("name");
const google::protobuf::FieldDescriptor *age_field =
descriptor_dyn->FindFieldByName("age");
const google::protobuf::FieldDescriptor *hobbies_field =
descriptor_dyn->FindFieldByName("hobbies");
const google::protobuf::FieldDescriptor *is_active_field =
descriptor_dyn->FindFieldByName("is_active");
if (!name_field || !age_field || !hobbies_field || !is_active_field) {
std::cerr << "Failed to find required fields!" << std::endl;
return 1;
}
reflection_dyn->SetString(message_dyn_.get(), name_field, "John Doe");
reflection_dyn->SetInt32(message_dyn_.get(), age_field, 30);
reflection_dyn->SetBool(message_dyn_.get(), is_active_field, true);
reflection_dyn->AddString(message_dyn_.get(), hobbies_field, "Reading");
reflection_dyn->AddString(message_dyn_.get(), hobbies_field, "Hiking");
reflection_dyn->AddString(message_dyn_.get(), hobbies_field, "Programming");
std::cout << "\nDebug Information:\n";
std::cout << "Message Type: " << message_dyn_->GetTypeName() << "\n";
std::cout << "Descriptor Name: " << descriptor_dyn->full_name() << "\n";
std::cout << "Number of Fields: " << descriptor_dyn->field_count() << "\n";
std::cout << "Unknown Fields Size: "
<< message_dyn_->GetReflection()
->GetUnknownFields(*message_dyn_)
.field_count()
<< "\n";
std::cout << "\nTrying DebugString():\n";
std::cout << message_dyn_->DebugString() << "\n\n";
// ============================================
// Section 8: Static vs Dynamic Message Comparison
// Creates a static DynamicPerson message and compares its serialization
// with the dynamic message to verify they are identical
// ============================================
cout << "\n=== Section 8: Static vs Dynamic Message Comparison ===\n";
cout << "------------------------------------------\n";
test::DynamicPerson static_person;
static_person.set_name("John Doe");
static_person.set_age(30);
static_person.set_is_active(true);
static_person.add_hobbies("Reading");
static_person.add_hobbies("Hiking");
static_person.add_hobbies("Programming");
// Clear unknown fields before serialization for fair comparison
message_dyn_->GetReflection()
->MutableUnknownFields(message_dyn_.get())
->Clear();
std::string static_serialized, dynamic_serialized;
static_person.SerializeToString(&static_serialized);
message_dyn_->SerializeToString(&dynamic_serialized);
std::cout << "Static vs Dynamic Message Comparison:\n";
std::cout << "Serialized data identical? "
<< (static_serialized == dynamic_serialized ? "YES" : "NO") << "\n";
if (static_serialized != dynamic_serialized) {
std::cout << "Static serialized size: " << static_serialized.size() << "\n";
std::cout << "Dynamic serialized size: " << dynamic_serialized.size()
<< "\n";
// Print hex representation of both serialized messages for debugging
std::cout << "Static serialized (hex): ";
for (unsigned char c : static_serialized) {
printf("%02x ", c);
}
std::cout << "\nDynamic serialized (hex): ";
for (unsigned char c : dynamic_serialized) {
printf("%02x ", c);
}
cout << "\n";
}
return 0;
}
Why GPB is not true dynamic type
GPB support dynamic type creation, dynamic data creation, why is it not considered true dynamic type(at least by me)? I think the reason are following:
- Runtime type creation is just defered compile time type registration. When user use protoc to generate types and those types are registered during program start up, before main. The registration process is the same as the so-called runtime type creation, both reading from a text and compose a type descriptor, then register it to DescriptorPool. More importantly, once the type has been registered, it can not be modified: GPB can not update all instance of this type if the type itself can be modified during runtime. Think about what happens if one type is changed, might be a new field is added, and the existing instancees can not be updated to contain this new field in data.
- DynamicMessage type do bring GPB one step close to true dynamic type. It works very much like PyObject class in Python. However, it still depend on Descriptor to contruct one message, yet again Descriptor can not be changed once registered. In Python, class attributes and instance attributes are added into a dynamic expanding list, which can be added and deleted at runtime. This can not be done with GPB’s descriptor. For GPB to be truely dynamic, it must make Descriptor type dynamic at runtime, since the descriptors are the real ones that defines a type. But in GPB, Descriptor can be created and changed during runtime, but it can not be changed once registered.
GPB supports runtime descriptor construction and dynamic message handling (via DynamicMessage), but the immutability of registered descriptors and the inability to propagate type changes to existing instances make it a static type system with limited runtime flexibility—not a true dynamic type system. This design prioritizes performance, safety, and serialization reliability over full runtime dynamism.
Python achieves the propagation of type changes to existing instances through its dynamic object model, leveraging two key features:
- Mutable class dictionaries
- Attribute lookup delegation
In Python, an instance (obj) stores only its unique instance-level attributes in its own dict (dictionary).For class-level attributes (including methods), the instance delegates to its class. The class holds these attributes in its dict. If you modify a class’s dict (e.g., add/change attributes), all existing instances immediately reflect the changes because attribute lookup always queries the class dynamically. When you access obj.attr: Python checks obj.dict for an instance attribute attr. If not found, it delegates to the class’s dict. If the class doesn’t have it, it checks base classes (MRO). Class-level changes are visible instantly because the lookup happens at runtime, not at instance creation.
class MyClass:
pass
def new_method(self):
print(self.x * 2)
obj1 = MyClass()
MyClass.double_x = new_method
obj1.x = 5
obj2 = MyClass()
# obj1 can find double_x and x, can success
obj1.double_x()
# obj2 can find double_x, but not x, fail
obj2.double_x()
For dynamic types, the programmer are programming towards interpreter, programmer are acutually writing text; For static types, the programmer are programming towards compiler, the programmer are writing code