Discuss on Groups View on GitHub

C++ Serialization

The Cap’n Proto C++ runtime implementation provides an easy-to-use interface for manipulating messages backed by fast pointer arithmetic. This page discusses the serialization layer of the runtime; see C++ RPC for information about the RPC layer.

Example Usage

For the Cap’n Proto definition:

struct Person {
  id @0 :UInt32;
  name @1 :Text;
  email @2 :Text;
  phones @3 :List(PhoneNumber);

  struct PhoneNumber {
    number @0 :Text;
    type @1 :Type;

    enum Type {
      mobile @0;
      home @1;
      work @2;
    }
  }

  employment :union {
    unemployed @4 :Void;
    employer @5 :Text;
    school @6 :Text;
    selfEmployed @7 :Void;
    # We assume that a person is only one of these.
  }
}

struct AddressBook {
  people @0 :List(Person);
}

You might write code like:

#include "addressbook.capnp.h"
#include <capnp/message.h>
#include <capnp/serialize-packed.h>
#include <iostream>

void writeAddressBook(int fd) {
  ::capnp::MallocMessageBuilder message;

  AddressBook::Builder addressBook = message.initRoot<AddressBook>();
  ::capnp::List<Person>::Builder people = addressBook.initPeople(2);

  Person::Builder alice = people[0];
  alice.setId(123);
  alice.setName("Alice");
  alice.setEmail("alice@example.com");
  // Type shown for explanation purposes; normally you'd use auto.
  ::capnp::List<Person::PhoneNumber>::Builder alicePhones =
      alice.initPhones(1);
  alicePhones[0].setNumber("555-1212");
  alicePhones[0].setType(Person::PhoneNumber::Type::MOBILE);
  alice.getEmployment().setSchool("MIT");

  Person::Builder bob = people[1];
  bob.setId(456);
  bob.setName("Bob");
  bob.setEmail("bob@example.com");
  auto bobPhones = bob.initPhones(2);
  bobPhones[0].setNumber("555-4567");
  bobPhones[0].setType(Person::PhoneNumber::Type::HOME);
  bobPhones[1].setNumber("555-7654");
  bobPhones[1].setType(Person::PhoneNumber::Type::WORK);
  bob.getEmployment().setUnemployed();

  writePackedMessageToFd(fd, message);
}

void printAddressBook(int fd) {
  ::capnp::PackedFdMessageReader message(fd);

  AddressBook::Reader addressBook = message.getRoot<AddressBook>();

  for (Person::Reader person : addressBook.getPeople()) {
    std::cout << person.getName().cStr() << ": "
              << person.getEmail().cStr() << std::endl;
    for (Person::PhoneNumber::Reader phone: person.getPhones()) {
      const char* typeName = "UNKNOWN";
      switch (phone.getType()) {
        case Person::PhoneNumber::Type::MOBILE: typeName = "mobile"; break;
        case Person::PhoneNumber::Type::HOME: typeName = "home"; break;
        case Person::PhoneNumber::Type::WORK: typeName = "work"; break;
      }
      std::cout << "  " << typeName << " phone: "
                << phone.getNumber().cStr() << std::endl;
    }
    Person::Employment::Reader employment = person.getEmployment();
    switch (employment.which()) {
      case Person::Employment::UNEMPLOYED:
        std::cout << "  unemployed" << std::endl;
        break;
      case Person::Employment::EMPLOYER:
        std::cout << "  employer: "
                  << employment.getEmployer().cStr() << std::endl;
        break;
      case Person::Employment::SCHOOL:
        std::cout << "  student at: "
                  << employment.getSchool().cStr() << std::endl;
        break;
      case Person::Employment::SELF_EMPLOYED:
        std::cout << "  self-employed" << std::endl;
        break;
    }
  }
}

C++ Feature Usage: C++11, Exceptions

This implementation makes use of C++11 features. If you are using GCC, you will need at least version 4.7 to compile Cap’n Proto. If you are using Clang, you will need at least version 3.2. These compilers required the flag -std=c++11 to enable C++11 features – your code which #includes Cap’n Proto headers will need to be compiled with this flag. Other compilers have not been tested at this time.

This implementation prefers to handle errors using exceptions. Exceptions are only used in circumstances that should never occur in normal operation. For example, exceptions are thrown on assertion failures (indicating bugs in the code), network failures, and invalid input. Exceptions thrown by Cap’n Proto are never part of the interface and never need to be caught in correct usage. The purpose of throwing exceptions is to allow higher-level code a chance to recover from unexpected circumstances without disrupting other work happening in the same process. For example, a server that handles requests from multiple clients should, on exception, return an error to the client that caused the exception and close that connection, but should continue handling other connections normally.

When Cap’n Proto code might throw an exception from a destructor, it first checks std::uncaught_exception() to ensure that this is safe. If another exception is already active, the new exception is assumed to be a side-effect of the main exception, and is either silently swallowed or reported on a side channel.

In recognition of the fact that some teams prefer not to use exceptions, and that even enabling exceptions in the compiler introduces overhead, Cap’n Proto allows you to disable them entirely by registering your own exception callback. The callback will be called in place of throwing an exception. The callback may abort the process, and is required to do so in certain circumstances (when a fatal bug is detected). If the callback returns normally, Cap’n Proto will attempt to continue by inventing “safe” values. This will lead to garbage output, but at least the program will not crash. Your exception callback should set some sort of a flag indicating that an error occurred, and somewhere up the stack you should check for that flag and cancel the operation. See the header kj/exception.h for details on how to register an exception callback.

KJ Library

Cap’n Proto is built on top of a basic utility library called KJ. The two were actually developed together – KJ is simply the stuff which is not specific to Cap’n Proto serialization, and may be useful to others independently of Cap’n Proto. For now, the the two are distributed together. The name “KJ” has no particular meaning; it was chosen to be short and easy-to-type.

As of v0.3, KJ is distributed with Cap’n Proto but built as a separate library. You may need to explicitly link against libraries: -lcapnp -lkj

Generating Code

To generate C++ code from your .capnp interface definition, run:

capnp compile -oc++ myproto.capnp

This will create myproto.capnp.h and myproto.capnp.c++ in the same directory as myproto.capnp.

To use this code in your app, you must link against both libcapnp and libkj. If you use pkg-config, Cap’n Proto provides the capnp module to simplify discovery of compiler and linker flags.

If you use RPC (i.e., your schema defines interfaces), then you will additionally nead to link against libcapnp-rpc and libkj-async, or use the capnp-rpc pkg-config module.

Setting a Namespace

You probably want your generated types to live in a C++ namespace. You will need to import /capnp/c++.capnp and use the namespace annotation it defines:

using Cxx = import "/capnp/c++.capnp";
$Cxx.namespace("foo::bar::baz");

Note that capnp/c++.capnp is installed in $PREFIX/include (/usr/local/include by default) when you install the C++ runtime. The capnp tool automatically searches /usr/include and /usr/local/include for imports that start with a /, so it should “just work”. If you installed somewhere else, you may need to add it to the search path with the -I flag to capnp compile, which works much like the compiler flag of the same name.

Types

Primitive Types

Primitive types map to the obvious C++ types:

Structs

For each struct Foo in your interface, a C++ type named Foo generated. This type itself is really just a namespace; it contains two important inner classes: Reader and Builder.

Reader represents a read-only instance of Foo while Builder represents a writable instance (usually, one that you are building). Both classes behave like pointers, in that you can pass them by value and they do not own the underlying data that they operate on. In other words, Foo::Builder is like a pointer to a Foo while Foo::Reader is like a const pointer to a Foo.

For every field bar defined in Foo, Foo::Reader has a method getBar(). For primitive types, get just returns the type, but for structs, lists, and blobs, it returns a Reader for the type.

// Example Reader methods:

// myPrimitiveField @0 :Int32;
int32_t getMyPrimitiveField();

// myTextField @1 :Text;
::capnp::Text::Reader getMyTextField();
// (Note that Text::Reader may be implicitly cast to const char* and
// std::string.)

// myStructField @2 :MyStruct;
MyStruct::Reader getMyStructField();

// myListField @3 :List(Float64);
::capnp::List<double> getMyListField();

Foo::Builder, meanwhile, has several methods for each field bar:

// Example Builder methods:

// myPrimitiveField @0 :Int32;
int32_t getMyPrimitiveField();
void setMyPrimitiveField(int32_t value);

// myTextField @1 :Text;
::capnp::Text::Builder getMyTextField();
void setMyTextField(::capnp::Text::Reader value);
::capnp::Text::Builder initMyTextField(size_t size);
// (Note that Text::Reader is implicitly constructable from const char*
// and std::string, and Text::Builder can be implicitly cast to
// these types.)

// myStructField @2 :MyStruct;
MyStruct::Builder getMyStructField();
void setMyStructField(MyStruct::Reader value);
MyStruct::Builder initMyStructField();

// myListField @3 :List(Float64);
::capnp::List<double>::Builder getMyListField();
void setMyListField(::capnp::List<double>::Reader value);
::capnp::List<double>::Builder initMyListField(size_t size);

Groups

Groups look a lot like a combination of a nested type and a field of that type, except that you cannot set, adopt, or disown a group – you can only get and init it.

Unions

A named union (as opposed to an unnamed one) works just like a group, except with some additions:

Unnamed unions differ from named unions only in that the accessor methods from the union’s members are added directly to the containing type’s reader and builder, rather than generating a nested type.

See the example at the top of the page for an example of unions.

Lists

Lists are represented by the type capnp::List<T>, where T is any of the primitive types, any Cap’n Proto user-defined type, capnp::Text, capnp::Data, or capnp::List<U> (to form a list of lists).

The type List<T> itself is not instantiatable, but has two inner classes: Reader and Builder. As with structs, these types behave like pointers to read-only and read-write data, respectively.

Both Reader and Builder implement size(), operator[], begin(), and end(), as good C++ containers should. Note, though, that operator[] is read-only – you cannot use it to assign the element, because that would require returning a reference, which is impossible because the underlying data may not be in your CPU’s native format (e.g., wrong byte order). Instead, to assign an element of a list, you must use builder.set(index, value).

For List<Foo> where Foo is a non-primitive type, the type returned by operator[] and iterator::operator*() is Foo::Reader (for List<Foo>::Reader) or Foo::Builder (for List<Foo>::Builder). The builder’s set method takes a Foo::Reader as its second parameter.

For lists of lists or lists of blobs, the builder also has a method init(index, size) which sets the element at the given index to a newly-allocated value with the given size and returns a builder for it. Struct lists do not have an init method because all elements are initialized to empty values when the list is created.

Enums

Cap’n Proto enums become C++11 “enum classes”. That means they behave like any other enum, but the enum’s values are scoped within the type. E.g. for an enum Foo with value bar, you must refer to the value as Foo::BAR.

To match prevaling C++ style, an enum’s value names are converted to UPPERCASE_WITH_UNDERSCORES (whereas in the schema language you’d write them in camelCase).

Keep in mind when writing switch blocks that an enum read off the wire may have a numeric value that is not listed in its definition. This may be the case if the sender is using a newer version of the protocol, or if the message is corrupt or malicious. In C++11, enums are allowed to have any value that is within the range of their base type, which for Cap’n Proto enums is uint16_t.

Blobs (Text and Data)

Blobs are manipulated using the classes capnp::Text and capnp::Data. These classes are, again, just containers for inner classes Reader and Builder. These classes are iterable and implement size() and operator[] methods. Builder::operator[] even returns a reference (unlike with List<T>). Text::Reader additionally has a method cStr() which returns a NUL-terminated const char*.

As a special convenience, if you are using GCC 4.8+ or Clang, Text::Reader (and its underlying type, kj::StringPtr) can be implicitly converted to and from std::string format. This is accomplished without actually #includeing <string>, since some clients do not want to rely on this rather-bulky header. In fact, any class which defines a .c_str() method will be implicitly convertible in this way. Unfortunately, this trick doesn’t work on GCC 4.7.

Interfaces

Interfaces (RPC) have their own page.

Generics

Generic types become templates in C++. The outer type (the one whose name matches the schema declaration’s name) is templatized; the inner Reader and Builder types are not, because they inherit the parameters from the outer type. Similarly, template parameters should refer to outer types, not Reader or Builder types.

For example, given:

struct Map(Key, Value) {
  entries @0 :List(Entry);
  struct Entry {
    key @0 :Key;
    value @1 :Value;
  }
}

struct People {
  byName @0 :Map(Text, Person);
  # Maps names to Person instances.
}

You might write code like:

void processPeople(People::Reader people) {
  Map<Text, Person>::Reader reader = people.getByName();
  capnp::List<Map<Text, Person>::Entry>::Reader entries =
      reader.getEntries()
  for (auto entry: entries) {
    processPerson(entry);
  }
}

Note that all template parameters will be specified with a default value of AnyPointer. Therefore, the type Map<> is equivalent to Map<capnp::AnyPointer, capnp::AnyPointer>.

Constants

Constants are exposed with their names converted to UPPERCASE_WITH_UNDERSCORES naming style (whereas in the schema language you’d write them in camelCase). Primitive constants are just constexpr values. Pointer-type constants (e.g. structs, lists, and blobs) are represented using a proxy object that can be converted to the relevant Reader type, either implicitly or using the unary * or -> operators.

Messages and I/O

To create a new message, you must start by creating a capnp::MessageBuilder (capnp/message.h). This is an abstract type which you can implement yourself, but most users will want to use capnp::MallocMessageBuilder. Once your message is constructed, write it to a file descriptor with capnp::writeMessageToFd(fd, builder) (capnp/serialize.h) or capnp::writePackedMessageToFd(fd, builder) (capnp/serialize-packed.h).

To read a message, you must create a capnp::MessageReader, which is another abstract type. Implementations are specific to the data source. You can use capnp::StreamFdMessageReader (capnp/serialize.h) or capnp::PackedFdMessageReader (capnp/serialize-packed.h) to read from file descriptors; both take the file descriptor as a constructor argument.

Note that if your stream contains additional data after the message, PackedFdMessageReader may accidentally read some of that data, since it does buffered I/O. To make this work correctly, you will need to set up a multi-use buffered stream. Buffered I/O may also be a good idea with StreamFdMessageReader and also when writing, for performance reasons. See capnp/io.h for details.

There is an example of all this at the beginning of this page.

Using mmap

Cap’n Proto can be used together with mmap() (or Win32’s MapViewOfFile()) for extremely fast reads, especially when you only need to use a subset of the data in the file. Currently, Cap’n Proto is not well-suited for writing via mmap(), only reading, but this is only because we have not yet invented a mutable segment framing format – the underlying design should eventually work for both.

To take advantage of mmap() at read time, write your file in regular serialized (but NOT packed) format – that is, use writeMessageToFd(), not writePackedMessageToFd(). Now, mmap() in the entire file, and then pass the mapped memory to the constructor of capnp::FlatArrayMessageReader (defined in capnp/serialize.h). That’s it. You can use the reader just like a normal StreamFdMessageReader. The operating system will automatically page in data from disk as you read it.

mmap() works best when reading from flash media, or when the file is already hot in cache. It works less well with slow rotating disks. Here, disk seeks make random access relatively expensive. Also, if I/O throughput is your bottleneck, then the fact that mmaped data cannot be packed or compressed may hurt you. However, it all depends on what fraction of the file you’re actually reading – if you only pull one field out of one deeply-nested struct in a huge tree, it may still be a win. The only way to know for sure is to do benchmarks! (But be careful to make sure your benchmark is actually interacting with disk and not cache.)

Dynamic Reflection

Sometimes you want to write generic code that operates on arbitrary types, iterating over the fields or looking them up by name. For example, you might want to write code that encodes arbitrary Cap’n Proto types in JSON format. This requires something like “reflection”, but C++ does not offer reflection. Also, you might even want to operate on types that aren’t compiled into the binary at all, but only discovered at runtime.

The C++ API supports inspecting schemas at runtime via the interface defined in capnp/schema.h, and dynamically reading and writing instances of arbitrary types via capnp/dynamic.h. Here’s the example from the beginning of this file rewritten in terms of the dynamic API:

#include "addressbook.capnp.h"
#include <capnp/message.h>
#include <capnp/serialize-packed.h>
#include <iostream>
#include <capnp/schema.h>
#include <capnp/dynamic.h>

using ::capnp::DynamicValue;
using ::capnp::DynamicStruct;
using ::capnp::DynamicEnum;
using ::capnp::DynamicList;
using ::capnp::List;
using ::capnp::Schema;
using ::capnp::StructSchema;
using ::capnp::EnumSchema;

using ::capnp::Void;
using ::capnp::Text;
using ::capnp::MallocMessageBuilder;
using ::capnp::PackedFdMessageReader;

void dynamicWriteAddressBook(int fd, StructSchema schema) {
  // Write a message using the dynamic API to set each
  // field by text name.  This isn't something you'd
  // normally want to do; it's just for illustration.

  MallocMessageBuilder message;

  // Types shown for explanation purposes; normally you'd
  // use auto.
  DynamicStruct::Builder addressBook =
      message.initRoot<DynamicStruct>(schema);

  DynamicList::Builder people =
      addressBook.init("people", 2).as<DynamicList>();

  DynamicStruct::Builder alice =
      people[0].as<DynamicStruct>();
  alice.set("id", 123);
  alice.set("name", "Alice");
  alice.set("email", "alice@example.com");
  auto alicePhones = alice.init("phones", 1).as<DynamicList>();
  auto phone0 = alicePhones[0].as<DynamicStruct>();
  phone0.set("number", "555-1212");
  phone0.set("type", "mobile");
  alice.get("employment").as<DynamicStruct>()
       .set("school", "MIT");

  auto bob = people[1].as<DynamicStruct>();
  bob.set("id", 456);
  bob.set("name", "Bob");
  bob.set("email", "bob@example.com");

  // Some magic:  We can convert a dynamic sub-value back to
  // the native type with as<T>()!
  List<Person::PhoneNumber>::Builder bobPhones =
      bob.init("phones", 2).as<List<Person::PhoneNumber>>();
  bobPhones[0].setNumber("555-4567");
  bobPhones[0].setType(Person::PhoneNumber::Type::HOME);
  bobPhones[1].setNumber("555-7654");
  bobPhones[1].setType(Person::PhoneNumber::Type::WORK);
  bob.get("employment").as<DynamicStruct>()
     .set("unemployed", ::capnp::VOID);

  writePackedMessageToFd(fd, message);
}

void dynamicPrintValue(DynamicValue::Reader value) {
  // Print an arbitrary message via the dynamic API by
  // iterating over the schema.  Look at the handling
  // of STRUCT in particular.

  switch (value.getType()) {
    case DynamicValue::VOID:
      std::cout << "";
      break;
    case DynamicValue::BOOL:
      std::cout << (value.as<bool>() ? "true" : "false");
      break;
    case DynamicValue::INT:
      std::cout << value.as<int64_t>();
      break;
    case DynamicValue::UINT:
      std::cout << value.as<uint64_t>();
      break;
    case DynamicValue::FLOAT:
      std::cout << value.as<double>();
      break;
    case DynamicValue::TEXT:
      std::cout << '\"' << value.as<Text>().cStr() << '\"';
      break;
    case DynamicValue::LIST: {
      std::cout << "[";
      bool first = true;
      for (auto element: value.as<DynamicList>()) {
        if (first) {
          first = false;
        } else {
          std::cout << ", ";
        }
        dynamicPrintValue(element);
      }
      std::cout << "]";
      break;
    }
    case DynamicValue::ENUM: {
      auto enumValue = value.as<DynamicEnum>();
      KJ_IF_MAYBE(enumerant, enumValue.getEnumerant()) {
        std::cout <<
            enumerant->getProto().getName().cStr();
      } else {
        // Unknown enum value; output raw number.
        std::cout << enumValue.getRaw();
      }
      break;
    }
    case DynamicValue::STRUCT: {
      std::cout << "(";
      auto structValue = value.as<DynamicStruct>();
      bool first = true;
      for (auto field: structValue.getSchema().getFields()) {
        if (!structValue.has(field)) continue;
        if (first) {
          first = false;
        } else {
          std::cout << ", ";
        }
        std::cout << field.getProto().getName().cStr()
                  << " = ";
        dynamicPrintValue(structValue.get(field));
      }
      std::cout << ")";
      break;
    }
    default:
      // There are other types, we aren't handling them.
      std::cout << "?";
      break;
  }
}

void dynamicPrintMessage(int fd, StructSchema schema) {
  PackedFdMessageReader message(fd);
  dynamicPrintValue(message.getRoot<DynamicStruct>(schema));
  std::cout << std::endl;
}

Notes about the dynamic API:

Orphans

An “orphan” is a Cap’n Proto object that is disconnected from the message structure. That is, it is not the root of a message, and there is no other Cap’n Proto object holding a pointer to it. Thus, it has no parents. Orphans are an advanced feature that can help avoid copies and make it easier to use Cap’n Proto objects as part of your application’s internal state. Typical applications probably won’t use orphans.

The class capnp::Orphan<T> (defined in <capnp/orphan.h>) represents a pointer to an orphaned object of type T. T can be any struct type, List<T>, Text, or Data. E.g. capnp::Orphan<Person> would be an orphaned Person structure. Orphan<T> is a move-only class, similar to std::unique_ptr<T>. This prevents two different objects from adopting the same orphan, which would result in an invalid message.

An orphan can be “adopted” by another object to link it into the message structure. Conversely, an object can “disown” one of its pointers, causing the pointed-to object to become an orphan. Every pointer-typed field foo provides builder methods adoptFoo() and disownFoo() for these purposes. Again, these methods use C++11 move semantics. To use them, you will need to be familiar with std::move() (or the equivalent but shorter-named kj::mv()).

Even though an orphan is unlinked from the message tree, it still resides inside memory allocated for a particular message (i.e. a particular MessageBuilder). An orphan can only be adopted by objects that live in the same message. To move objects between messages, you must perform a copy. If the message is serialized while an Orphan<T> living within it still exists, the orphan’s content will be part of the serialized message, but the only way the receiver could find it is by investigating the raw message; the Cap’n Proto API provides no way to detect or read it.

To construct an orphan from scratch (without having some other object disown it), you need an Orphanage, which is essentially an orphan factory associated with some message. You can get one by calling the MessageBuilder’s getOrphanage() method, or by calling the static method Orphanage::getForMessageContaining(builder) and passing it any struct or list builder.

Note that when an Orphan<T> goes out-of-scope without being adopted, the underlying memory that it occupied is overwritten with zeros. If you use packed serialization, these zeros will take very little bandwidth on the wire, but will still waste memory on the sending and receiving ends. Generally, you should avoid allocating message objects that won’t be used, or if you cannot avoid it, arrange to copy the entire message over to a new MessageBuilder before serializing, since only the reachable objects will be copied.

Reference

The runtime library contains lots of useful features not described on this page. For now, the best reference is the header files. See:

capnp/list.h
capnp/blob.h
capnp/message.h
capnp/serialize.h
capnp/serialize-packed.h
capnp/schema.h
capnp/schema-loader.h
capnp/dynamic.h

Tips and Best Practices

Here are some tips for using the C++ Cap’n Proto runtime most effectively:

Build Tips

Security Tips

Cap’n Proto has not yet undergone security review. It most likely has some vulnerabilities. You should not attempt to decode Cap’n Proto messages from sources you don’t trust at this time.

However, assuming the Cap’n Proto implementation hardens up eventually, then the following security tips will apply.

Lessons Learned from Protocol Buffers

The author of Cap’n Proto’s C++ implementation also wrote (in the past) verison 2 of Google’s Protocol Buffers. As a result, Cap’n Proto’s implementation benefits from a number of lessons learned the hard way: