Type Erasure

This is understanding notes for this presentation: C++ Type Erasure Demystified - Fedor G Pikus - C++Now 2024. Ppt at here.

Core logic behind type erasure
Implementation methods for type erasure
Abstraction of type
std::function
std::variant

Core logic behind type erasure

Encapsulate type infomation in implementation, remove type information in interface
Different implementation for different types might be manually coded or generated by compiler(C++)
- Template instantiation
- Virtual inheritence
Dispatch of implementation happens at construction phase. It might happens at compile time or runtime:
- For template instantiated implementations, it happens at compile time
- For virtual inheritence, it happens at runtime Dispatch is all about redirection of function pointers
The user of the interface is not aware of the passed type information, hence the type is erased.

What’s the benifit of type erasure? Even though there is a type erased interface, like qsort, but when the qsort is actually used, we still need to pass correct implementation of the comparision function to qsort, we just defer the choice of implementation to later time. The qsort function acts as a variation point, it abstract the sorting logic functionality by using only the type erased compare function. When qsort is called, it binds the sorting logic to the actual type compare implementation. Without qsort that use the type-erased compare interface to implement common behavior of sorting, the type-erased compare interface will have no practical meaning:

The qsort algorithm stands for the common behavior, which utilize the type-erased interface
The implementation of this common behavior abstract out any specific type information
When the implementation of this common behavior is used, user decides which type it is actually operates on(the common behavior implementation still know nothing about type whatsoever)

Without this abstraction, we have to implement qsort for each individual type and the sorting algorithm has to be coded every time for each type. Now with type erasure, we only need to code the actual sorting algorithm once for all possible types. Of cource the compare function has to be implemented for each type, since each type has their own comparing logic.

Implementation methods for type erasure

Virtual inheritance: redirection hanppens when using base class pointer to call implementation in derived class(whose type is erased)
Static templated functions: C++ compiler will generate code implementation for each template instantiation. It’s the same as C, but code generated instead of manually written. Note that generated static functions are class members, which means that the amount of instantiations equals the number of generated class member functions.
Vtable: similar as 2, this time use a vtable to point to generated static class member functions.

Note: Method 3 is how std::function is implemented.

#include <cstdlib>

// Following code are in *.cpp file, without Some exposed
// But user will call this code through type erased interface: void
// print_data(const void *p). This is the core of type erasure: encapsulate type
// infomation in implementation, remove type information in interface.
struct Some {
  int data;
};
bool less(const void *l, const void *r) {
  return static_cast<const Some *>(l)->data <
         static_cast<const Some *>(r)->data;
}

// The declaration in *.h, exposed to external user
// ! Here is where the type erasure happens
bool less(const void *, const void *);

// User of type erasure: qsort does not need to know which type it will sort,
// type is erased from qsort: type erasure is an abstraction for multiple
// implementations that provide the same behavior, the relevent behavior is what
// matters, not the type. In this example, `compare` parameter requires that two
// elements can be compared, qsort does not care about how it is compared, as
// long as it return a bool value. This gives chances to implement the compare
// logic in source code, instead of in interface code.

// Since type erasure is about abstraction of behavior, it alwarys involve
// redirection of function pointers, no matter which way it is implemented. This
// redirection of function pointer is called dispatch, which is another core
// brick in implementing type erasure. Dispatch might happen both at compile
// time or runtime.

// Compared with C, C++ ONLY add implementation methods for type erasure. In C,
// we have to manually write different implementations, like the `less` and
// `more` function in below. In C++, we have the compiler to generate different
// implementation code for us. They all involves template and are done at
// construction phase. The three ways are:
// 1. virtual inheritance: redirection hanppens when using base class pointer to
// call implementation in derived class(whose type is erased)
// 2. static templated functions: C++ compiler will generate
// code implementation for each template instantiation. It's the same as C, but
// code generated instead of manually written. Note that generated static
// functions are class members, which means that the amount of instantiations
// equals the number of generated class member functions.
// 3. vtable: Similar as 2, this time use a vtable to point to instead of
// generating static class member functions.

// One more important fact is that all the C++ ways are of value semantics. The
// implementation is stored as value(function pointers can be seen as value of
// function variables).

// Now we can summarize type erasure as follow:

// Core logic of type erasure:
// 1. Encapsulate type infomation in implementation, remove type information in
// interface.
// 2. The interface provide same behavior, regardless of specific type

// Implementation steps of type erasure involve two distinct phase: how
// implementation code is generated at compile time and how those implementation
// is dispatched to at runtime:

// 1. Statically write type erased code implementation, either manually, or by
// compiler(C++). The type must be inside cpp, not in header(interface)
// 2. Dynamically dispatch function call to the right implementation at runtime.
// How the dispatch is done varies. It can simply be hard coded(like in C qsort
// example in following code). Or it can be done using virtual inheritance in
// C++. Either way, the dispatch, or redirection is determined during
// construction phase. As soon as the construction is complete, the dispatch
// manner is determined.

void qsort(void *base, size_t nmeb, size_t size,
           bool (*compare)(const void *, const void *));

// Following give another type that also use qsort
struct Tome {
  int data;
};
bool more(const void *l, const void *r) {
  return static_cast<const Tome *>(l)->data <
         static_cast<const Tome *>(r)->data;
}
bool more(const void *, const void *);

int main() {
  Some a[10];
  Tome b[10];
  // qsort is universal, thanks to the redirection of `compare` parameter
  qsort(a, 10, 4, less);
  qsort(b, 10, 4, more);
}

Abstraction of type

Type erasure, C++ template, C++ concept, virtual inheritence, what is the common characteristic among them? They both allow us to write code logic for a group of types, instead of just one type. The code logic is the common behavior for those types. The binding of specific types are deferred to later times: for type erasure and virtual inheritence, this binding is defered to runtime; for C++ template and C++ concept, this binding is deferred to compile time:

We can write binary libraries which can be used on difference types using type erasure and virtual inheritence
We can write templated source code libraries which can be used on difference types using C++ template and C++ concept

C++ concept is more restricted C++ template. C++ virtual inheritence is special kind of type erasure, with vtable as runtime dispatch method. All being said, at the core, they are all function pointer binding methods, at compile time, or at runtime.

std::function

After the signature is specified through template paramter, std::function variable can be used to store difference kinds of types, as long as they both have the same signature. This is done through type erasure. std::function has value semantics and can be copied and moved. After the template signature is determined the code for all methods is fixed for the compiler. After a std::function instance is constructed, the implementation binding is fixed.

Note that std::function variables, like virtual base class pointers can be re-assigned to other std::function instance with the same signature at runtime, just like virtual base class pointers changed to point to other derived class instances. This is because, internally, std::function erased type for specific implementation type after an instance is constructed. Take the qsort for example, if i have a class instance that stores qsort and less, another instance can store qsort and more, since less and more have the same signature and are type erased. The external API of std::function works for any instances(which have different function pointer bindings, which happens at compile time, and have difference implementation code).

std::variant

std::variant itself is not type erasure, it’s a union and will record flags to indicate which type it currently stores
std::visit involves two phase:
- Dispatch right function according to specific type stored in std::variant at runtime
- Each function is stored type erased, like std::function