Fundamentals 14 min read

Why Protocol Buffers Outperform JSON and XML for Mobile Data Transfer

Protocol Buffers, Google’s cross‑platform serialization format, dramatically reduces payload size and speeds up transmission compared to JSON or XML by using compact binary encoding, Varint and ZigZag techniques, and offers powerful reflection mechanisms illustrated with Objective‑C examples and detailed code snippets.

Suishouji Tech Team
Suishouji Tech Team
Suishouji Tech Team
Why Protocol Buffers Outperform JSON and XML for Mobile Data Transfer

Background

During client‑server interaction, the Handnote app required higher efficiency for data size and transmission speed, prompting the adoption of Google’s Protocol Buffers instead of JSON or XML.

Introduction

Protocol Buffers (https://developers.google.com/protocol-buffers/) is an open‑source, cross‑platform, multi‑language serialization format. Compared with XML and JSON, it is smaller, faster, and simpler. Its syntax currently has two versions: proto2 and proto3.

For a sample JSON object:

{
  "id": 1,
  "name": "jojo",
  "email": "[email protected]"
}

Encoding this JSON yields a binary length of 43 bytes:

7b226964223a312c226e616d65223a226a6f6a6f222c22656d61696c223a2231323340712e636f6d7d

Encoding the same data with Protobuf produces only 20 bytes:

0a046a6f6a1001 1a0a3132334071712e636f6d

Encoding Details

Protobuf’s efficiency stems from its unique encoding methods. For small int32 values, it uses Varint encoding, which can represent the number in a single byte.

Varint

In Varint, the most‑significant bit of each byte indicates whether more bytes follow (1) or this is the final byte (0). For example, the number 300 is encoded as 10101100 00000010.

Varint illustration
Varint illustration
Note: During parsing, the two bytes are swapped because the byte order is little‑endian.

Varint is less efficient for signed numbers because the sign bit cannot be omitted, often requiring five bytes for values like -1 ( 010001). Protobuf solves this with ZigZag encoding.

ZigZag

ZigZag illustration
ZigZag illustration

ZigZag maps signed integers to unsigned values using the hash function h(n) = (n<<1) ^ (n>>31) (or h(n) = (n<<1) ^ (n>>63) for sint64), producing a monotonic 32‑bit bit stream that uses fewer bytes for small absolute values.

T‑V and T‑L‑V

Protobuf messages consist of a series of Tag‑Value pairs. The Tag combines the field number and wire type; the Value is the encoded data. For length‑variable types like string, a length field is added, forming a T‑L‑V structure.

message Person {
  int32 id = 1;
  string name = 2;
}

In this example, the field id has field number 1 and wire type for int32. Its tag is calculated as (field_number << 3) | wire_type = 00001000.

Reflection Mechanism

Protobuf provides a strong reflection API. Each concrete message type corresponds to a Descriptor object. Using a DescriptorPool you can find a descriptor by type name, then a MessageFactory creates an instance of that message.

Message* createMessage(const std::string& typeName) {
  Message* message = NULL;
  const Descriptor* descriptor = DescriptorPool::generated_pool()->FindMessageTypeByName(typeName);
  if (descriptor) {
    const Message* prototype = MessageFactory::generated_factory()->GetPrototype(descriptor);
    if (prototype) {
      message = prototype->New();
    }
  }
  return message;
}
Note: The DescriptorPool contains all protobuf message types linked at compile time; MessageFactory can instantiate any of them.

Protobuf‑Objective‑C

In Objective‑C, a message such as:

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;
}

can be decoded from binary data with:

Person *newP = [[Person alloc] initWithData:data error:nil];

The initializer ultimately calls mergeFromData:, which creates a GPBCodedInputStream from the data and merges the fields.

- (void)mergeFromData:(NSData *)data extensionRegistry:(GPBExtensionRegistry *)extensionRegistry {
  GPBCodedInputStream *input = [[GPBCodedInputStream alloc] initWithData:data];
  [self mergeFromCodedInputStream:input extensionRegistry:extensionRegistry];
  [input checkLastTagWas:0];
  [input release];
}

The merge routine iterates over each FieldDescriptor, dispatching to type‑specific decode functions such as MergeSingleFieldFromCodedInputStream for scalar fields.

- (void)mergeFromCodedInputStream:(GPBCodedInputStream *)input extensionRegistry:(GPBExtensionRegistry *)extensionRegistry {
  GPBDescriptor *descriptor = [self descriptor];
  GPBFileSyntax syntax = descriptor.file.syntax;
  NSArray *fields = descriptor->fields_;
  for (NSUInteger i = 0; i < fields.count; ++i) {
    GPBFieldDescriptor *fieldDescriptor = fields[i];
    GPBFieldType fieldType = fieldDescriptor.fieldType;
    if (fieldType == GPBFieldTypeSingle) {
      MergeSingleFieldFromCodedInputStream(self, fieldDescriptor, syntax, input, extensionRegistry);
    } else if (fieldType == GPBFieldTypeRepeated) {
      // repeated field handling
    } else {
      // other types handling
    }
  }
}

For an int32 field, the value is read via GPBCodedInputStreamReadInt32, which internally reads a Varint, and then assigned with GPBSetInt32IvarWithFieldInternal.

int32_t GPBCodedInputStreamReadInt32(GPBCodedInputStreamState *state) {
  int32_t value = ReadRawVarint32(state);
  return value;
}

Finally, the descriptor for a generated message class is cached in a static variable and provides metadata for all fields.

+ (GPBDescriptor *)descriptor {
  static GPBDescriptor *descriptor = nil;
  if (!descriptor) {
    static GPBMessageFieldDescription fields[] = {
      {.name = "name", .number = Person_FieldNumber_Name, .dataType = GPBDataTypeString, .flags = GPBFieldOptional},
      // ... other fields ...
    };
    descriptor = // construct descriptor from fields
  }
  return descriptor;
}

Related Links

Google Protocol Buffers Docs (https://developers.google.com/protocol-buffers/)

Automatic Reflection Message Type Transfer Scheme (http://blog.csdn.net/solstice/article/details/6300108)

Integer Compression Encoding ZigZag (http://www.cnblogs.com/en-heng/p/5570609.html)

Google Protocol Buffer Usage and Principles (https://www.ibm.com/developerworks/cn/linux/l-cn-gpb/)

Reflectionbinary encodingZigzagVarint
Suishouji Tech Team
Written by

Suishouji Tech Team

Suishouji's official tech channel, sharing original technical articles, posting recruitment opportunities, and hosting events. Follow us.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.