Generating Switch Statements in C++

If you prefer to skip the english you can find a copy of the source in this post here

Recently I was refactoring some C++ and found myself needing to dispatch one of several calls via a template index. Of course while this is simple when we’re dealing with a static index—this is not so simple if the index is discovered at runtime because the only way to get a static copy is to inspect each possible value. My first intuition was to create a function table because it’s quick to implement and reflects a sort of runtime variant of a template over integers. However I wanted to give the optimizer a chance to inline each specialized call because the majority were a mere single expression and in a performance sensitive section. Since an indirect call has (rather obviously) no hope of being inlined I was left with no other choice than to make each call and its selection explicit. While possible with a recursive chain of if-else statements which has the bonus of an appealing simplicity, I didn’t expect the optimizer to do much with such an approach so I decided on a switch instead (I did actually check and though hopeful, gcc emitted a cascade of conditional jumps—I tried lambdas too which were significantly worse).

The first thing to ask is what a “switch call” will look like.
If we were designing a simple interpreter, we might switch on the operation as in the following

int run_operation(const int operation, const bool printResult,
                  const int x, const int y) {
    int result;
    switch (operation) {
    case ADD: result = x + y; break;
    case SUB: result = x - y; break;
    case MUL: result = x * y; break;
    case DIV: result = x / y; break;
    case MOD: result = x % y; break;
    default: assert(false);
    if (printResult) printf("%i\n", result);
    return result;

This is pretty suggestive of what a switch_ metafunction should look like. It will of course accept a template (my real motivation) which provides the mapping of case numbers to case functions; it should accept a value to switch on; and it should forward any other arguments onto the case bodies. Rewriting run_operation with such a function should look something like this

return switch_<operation_table>::run(operation, printResult, x, y);

Here the template argument int is the return type, the first argument to run is the value to switch on, and the remaining arguments are to be passed into the case’s body.
What will the template operation_table look like? We’d decided it should be a map of case numbers to functions so similar to a real switch we’ll use a (static) sequence of pairs, each with an integral value to match on and a body to be executed. This is enough information for a first stab at the signature of switch_ and its static function run

template <typename Xs>
struct switch_ {
    static R run(int n, zero or more variable parameters);

An example of Xs (the sequence of pairs) is

template <int N>
struct operation_table {
    static int value(const bool printResult, const int x, const int y) {

  pair<int_<0>, operation_table<ADD> >,
  pair<int_<1>, operation_table<SUB> >,
  pair<int_<2>, operation_table<MUL> >,
  pair<int_<3>, operation_table<DIV> >,
  pair<int_<4>, operation_table<MOD> >
> >::run(operation, printResult, x, y);

Since you may at this point be asking what we’ve gained, let’s start by providing a convenience function offering some concision

template  <int Lower, int Upper, template <int> class Table>
struct switch_table;

This at least simplifies things

switch_table<0, 4, operation_table>::run(operation, printResult, x, y)

Having defined what we’d like a generated switch to looks like, how do we actually go about implementing it?
A good starting place is in the generation of cases. Notice however that we can’t produce the cases for the switch using templates so we’re forced to fall back to the preprocessor, and while the preprocessor can send information to our templates, the templates obviously can’t send information back to the preprocessor. This is a problem because as everything is currently phrased, the preprocessor doesn’t know the number of cases to emit—the template does. To clarify, in the case of switch_table this is via the first two template arguments, Lower and Upper; and in the case of switch_ this is via the length of Xs.
To circumvent this problem we’ll create a family of metafunctions parametrized by the number of cases they include, template <int NumberOfCases, typename Xs> switch_impl, allowing the preprocessor to pass into the template the number of cases it intends to emit. The user exposed metafunction switch_ will then call switch_impl<length of Xs, Xs>.

#ifndef _metalevel_switch_h_
#define _metalevel_switch_h_

#include <boost/mpl/at.hpp>
#include <boost/mpl/if.hpp>
#include <boost/mpl/list.hpp>
#include <boost/mpl/pair.hpp>
#include <boost/mpl/push_front.hpp>
#include <boost/mpl/size.hpp>

#include <boost/preprocessor/iterate.hpp>
#include <boost/preprocessor/repetition/enum_params.hpp>
#include <boost/preprocessor/repetition/enum_binary_params.hpp>
#include <boost/preprocessor/repetition/repeat.hpp>

#ifdef __GNUC__
#  define always_inline __attribute__((always_inline))
#elif defined(_MSC_VER)
#  define always_inline __forceinline
#  define always_inline inline



namespace metalevel {

namespace aux {
    template <int NumberOfCases, typename Xs>
    struct switch_impl;
#   define BOOST_PP_FILENAME_1 <switch_impl.hpp>
#   include BOOST_PP_ITERATE()
} // namespace aux

template <typename Xs>
struct switch_
  : aux::switch_impl<boost::mpl::size::type::value - 1, Xs> {};
} // namespace metalevel


So all we’ve really accomplished is delegating most of the work to switch_impl.hpp using boost preprocessor’s file iteration (if you’re unfamiliar, the preprocessor docs found here are good). The above iteratively includes switch_impl.hpp METALEVEL_MAX_SWITCH_SIZE times, each time defining the pass number as a sort of argument to the header.
Before proceeding let’s quickly reiterate what’s going on here: switch_impl now handles switches with every possible number of cases (up to METALEVEL_MAX_SWITCH_SIZE). Then switch_ merely selects the correct number based on the size of its input list. For example switch_impl<2,Xs> implements run with case 0 and case 1 while switch_impl<7,Xs> implements run with case 0, case 1, case 2, case 3, case 4, case 5, and case 6.
So what does switch_impl.hpp look like?



template <typename Xs>
struct switch_impl<NUMBER_OF_CASES, Xs> {
#   define TYPEDEF_X_SUB_I(_, i, __) \
      typedef typename boost::mpl::at_c<Xs, i>::type x_##i;

#   define METALEVEL_CASE_TABLE_ENTRY(_, i, ts) \
      case boost::mpl::first<x_##i>::type::value: \
        return boost::mpl::second<x_##i>::type::value ts ;

#   define DEFINE_CHOICE_RUN(_, argc, __) \
      template <typename R BOOST_PP_COMMA_IF(argc) \
                BOOST_PP_ENUM_PARAMS(argc, typename T)> \
      static always_inline R \
      run(int n BOOST_PP_COMMA_IF(argc) \
          BOOST_PP_ENUM_BINARY_PARAMS(argc, T, t)) { \
          switch(n) { \
                          METALEVEL_CASE_TABLE_ENTRY, \
                          (BOOST_PP_ENUM_PARAMS(argc, t))) \
          } \
                    DEFINE_CHOICE_RUN, ~)




While a bit obscured by preprocessor iteration, nothing too interesting is actually going on here. Writing out a simple example really illuminates how this works so lets consider a copy of switch_impl when NumberOfCases is 3.

template <>
struct switch_impl<3, Xs> {
    typedef boost::mpl::at_c<Xs, 0>::type x_0;
    typedef boost::mpl::at_c<Xs, 1>::type x_1;
    typedef boost::mpl::at_c<Xs, 2>::type x_2;

    static always_inline R run(int n, T0 t0, T1 t1, T2 t2) {
        switch (n) {
        case boost::mpl::first<x_0>::type::value:
            return boost::mpl::second<x_0>::type::value(t0, t1, t2);
        case boost::mpl::first<x_1>::type::value:
            return boost::mpl::second<x_1>::type::value(t0, t1, t2);
        case boost::mpl::first<x_2>::type::value:
            return boost::mpl::second<x_2>::type::value(t0, t1, t2);

We’re first calling upon the preprocessor to define several variables named x_i that are the ith types in the list Xs, if this is unclear they’re really just terms of the form Xs[i]. Continuing, we use the preprocessor to endow run with variadicness and rather than overloading run by hand we create a macro to define a copy of it taking argc arguments. Then we repeatedly call the macro, varying argc. Each copy of run generates a switch with NUMBER_OF_CASES cases, each of which in turn is generated by METALEVEL_CASE_TABLE_ENTRY. Recalling that the list Xs is a sequence of pairs, this macro simply takes the first item of the ith pair to be the case number, and the second item (which is a function) to be the case body.

At this point we’re basically ready to test things out, all that’s left to define is the convenience function switch_table. The idea here is to enumerate each item, building up a list of pairs to pass into switch_. This takes two steps because we need an extra parameter that tells us when to stop recursing, here we call that parameter Counter

namespace metalevel {
namespace aux {
    template <int Counter, int N, template <int> class Table>
    struct tabulate
      : boost::mpl::push_front<
          tabulate<Counter-1, N-1, Table>,
          boost::mpl::pair<boost::mpl::int_, Table >
        >::type {};

    template <int N, template <int> class Table>
    struct tabulate<0, N, Table>
      : boost::mpl::list<
          boost::mpl::pair<boost::mpl::int_, Table >
        > {};
} // namespace aux

template  class Table>
struct switch_table
  : switch_<aux::tabulate > {};
} // namespace metalevel

Admittedly a bit verbose but overall nothing terribly strange.

Of course we should check that all this trouble actually gets us the assembly we’ve hoped for. The test case we’ll use is just an extension of the interpreter fragment we started with into a full fledged program. For comparison we’ll also include an array of function pointers.

#include <stdio.h>
#include <switch.hpp>

enum { ADD = 0, SUB = 1, MUL = 2, DIV = 3, MOD = 4 };

template <int Operation>
struct interpreter {
    static inline int run(const int x, const int y);

template <> struct interpreter<ADD> {
    static const char *name() { return "ADD"; }
    static inline int run(const int x, const int y)
    { return x + y; }

template <> struct interpreter<SUB> {
    static const char *name() { return "SUB"; }
    static inline int run(const int x, const int y)
    { return x - y; }

template <> struct interpreter<MUL> {
    static const char *name() { return "MUL"; }
    static inline int run(const int x, const int y)
    { return x * y; }

template <> struct interpreter<DIV> {
    static const char *name() { return "DIV"; }
    static inline int run(const int x, const int y)
    { return x / y; }

template <> struct interpreter<MOD> {
    static const char *name() { return "MOD"; }
    static inline int run(const int x, const int y)
    { return x % y; }

template <int Operation>
struct operation_table {
    static always_inline int value(const bool printResult,
                                   const int x, const int y) {
        const int r = interpreter<Operation>::run(x, y);
        if (printResult) printf("operation_table<%s>::run(%i, %i) = %i\n",
                                 interpreter<Operation>::name(), x, y, r);
        return r;

int main() {
    int(*operations[])(const bool,const int,const int) = {
        operation_table<ADD>::value, operation_table<SUB>::value,
        operation_table<MUL>::value, operation_table<DIV>::value,

    bool printResult = true;
    int op,x,y;

    printf("op,x,y: ");
    scanf("%i,%i,%i", &op, &x, &y);

    metalevel::switch_table<0,4, operation_table>::run<int>(op,printResult,x,y);

    return 0;

Since printResult is constant we expect the optimizer to propagate it through the inlined calls and since op, x, and y are dynamic we can be sure switch_table won’t get crushed into a single call to printf.

Lets first look at the result of a call to operations[op](printResult,x,y),

  movslq  52(%rsp), %rax  # load op
  movl    $1, %edi        # set flag to true
  callq   *(%rsp,%rax,8)  # jump to operations[op]

Proceeding with an example function body, consider the case when op = MUL

  .string "operation_table<%s>::run(%i, %i) = %i\n"
  .string "MUL"
  pushq   %rbx
  movl    %esi, %ebx   # load x
  imull   %edx, %ebx   # take x*y
  testb   %dil, %dil   # check if we should printResult
  je      .L7          # jump over print if printResult is zero
  movl    %edx, %ecx   # load y for printing
  movl    %ebx, %r8d   # load x*y for printing
  movl    %esi, %edx   # load x for printing
  movl    $.LC1, %edi  # load format string
  movl    $.LC3, %esi  # load operation name for printing
  xorl    %eax, %eax   # no vector registers used in vararg call
  call    printf
  movl    %ebx, %eax   # return x*y
  popq    %rbx

This matches our expectation that gcc has no real opportunity for contextual optimization.

Moving on to the switch implementation, what does a call to run give us?

  cmpl    $4, 52(%rsp)     # check that op is within case bounds
  ...                      # load x and y for both switch_ and operations[]
  ja      .L13             # jump to default case if op's outside bounds
  mov     52(%rsp), %eax   # load op for jump table
  jmp     *.L19(,%rax,8)   # jump to case body of op
  .quad   .L14             # address of inlined add
  .quad   .L15             # address of inlined sub
  .quad   .L16             # address of inlined mul
  .quad   .L17             # address of inlined div
  .quad   .L18             # address of inlined mod

Which is fairly similar to the array implementation so lets compare the inlined function body when op = MUL

  .string "operation_table<%s>::run(%i, %i) = %i\n"
  .string "MUL"
  movl    %ecx, %r8d       # load y
  movl    %esi, %edx       # load x for printing
  imull   %esi, %r8d       # load x*y for printing
  movl    $.LC3, %esi      # load "MUL" string
  jmp     .L20             # jump to shared body printing the result
  movl    $.LC1, %edi      # load format string
  xorl    %eax, %eax       # no vector registers used in vararg call
  call    printf
  ...                      # run operations[op](printResult,x,y)

Unsurprisingly gcc has done a good job of condensing things and removed the check of printResult entirely.

Now, why would you ever prefer this over a handwritten switch, typical OO polymorphism, or anything else?
In answer to the first question I think this allows a more table driven approach to your standard switch which seems to be an increasingly popular way of solving problems. Additionally if you hadn’t already split your cases into modular components this will force you to. Of course these sorts of questions are always a matter of taste.
Addressing the second and third questions together: not that often. However I do think the utility while niche, is indispensable in generative programming. As a simple example if you have a statically defined map that you’d like to use with a runtime value, a switch may offer good optimization opportunities. Other issues which we addressed above relate to inlining and static extensibility. Ultimately of course if our only goal is performance we should save ourselves embarrassment and just measure things.

A-99 Lists 5 (*) Reverse a list

Reversing a list in a language like Haskell is fairly simple with plenty of solutions. The recursive algorithm we’ll use accumulates the result in a second list as the first is being traversed.

reverse :: [a] -> [a]
reverse xs = reverse' xs []
  where reverse' (x : xs) rxs = reverse' xs (x : rxs)
        reverse' []       rxs = rxs

However, using our Vector type in Agda requires we name the size of the two input lists as well as the output, each of which will clearly be changing every iteration. When looking for the invariant notice that

  1. the length of the output should be a function of the two inputs and
  2. that we’re really required to allow the input lists to range over all the natural numbers.

You’ve probably spotted it already but the sum of the two inputs will always be the size of the original list xs.

Unfortunately a simple transcription of the above Haskell function with the correct lengths still won’t type check for a rather infuriating reason surrounding how simplification of addition works.

reverse : ∀{n}{A : Set} → Vec A n → Vec A n
reverse xs = reverse' xs [] -- problem 1: n + 0 != n
  where reverse' : ∀{m n}{A : Set} → Vec A n → Vec A m → Vec A (n + m)
        reverse' (x  xs) rxs = reverse' xs (x  rxs) -- problem 2: n + suc m != suc (n + m)
        reverse' [] rxs = rxs

Obviously both problems should have solutions given that in fact the sum of n and 0 is n, and that addition is commutative. Unfortunately if you dig out the definition of addition (on my system /usr/share/Agda-2.2.10/lib-0.5/src/Data/Nat.agda) you’ll find the following.

_+_ : 
zero + n = n
suc m + n = suc (m + n)

So agda can see that 0 + n = n and that suc m + n = suc (m + n), but that n + 0 = 0 and m + suc n = suc (m + n) are unobvious.

Fortunately understanding the problem allows us to solve it fairly easily. There is a function subst that allows us to substitute equal terms for one another in our types—thus if we can show each of the above we can invoke subst to massage our function into agda’s liking.

First addressing problem 2 in a straightforward inductive style

n+m+1≡n+1+m : (n m : ) → n + suc m  suc n + m
n+m+1≡n+1+m (suc n) m = cong suc (n+m+1≡n+1+m n m)
n+m+1≡n+1+m 0 m = refl
  1. Show suc n + suc m is suc (suc n) + m.
    By induction (that’s the recursive call) we know that n + suc m is suc n + m and hence that (see the definition of cong) suc (n + suc m) is suc (suc n + m) from which agda deduces our goal.
  2. Show that 0 + suc m is suc 0 + m.
    Agda knows that 0 + suc m is suc m and that suc 0 + m is suc m so our goal is to show that suc m equals itself—which follows from reflexivity.

Having shown the required lemma we should take a closer look at subst to understand its application. Opening up Relation/Binary/PropositionalEquality/Core.agda we see that

subst : {A : Set} → (P : A → Set) → ∀{x y} → x  y → P x → P y
subst P refl p = p

So like magic if we can supply proofs that x is y and that P x holds, subst will give us that the same term has type P y. How did that happen? Uncovering the type of definitional equality gives us the answer (see Relation/Binary/Core.agda).

data _≡_ {A : Set} (x : A) : A → Set where
  refl : x  x

This strange definition actually states that the only inhabitant of the type _≡_ is the proof that a term equals itself. Thus when we hand subst a proof that x ≡ y along with a proof of P x, it deduces that y is defined as x, and hence the same term p is also a proof of P y.

We’re finally at the point of solving reverse’.

reverse' : ∀{n m}{A : Set} → Vector A n → Vector A m → Vector A (n + m)
reverse' {suc n}{m} (x  xs) rxs = subst (λ k → Vector A k) (n+m+1≡n+1+m n m) (reverse' xs (x  rxs))
reverse' [] rxs = rxs

Recalling that the type of reverse’ xs (x ∷ rxs) is Vector A (n + suc m), we would like to show that this is Vector A (suc n + m). Using subst we take P k = Vector A k along with the proof n+m+1≡n+1+m n m showing n + suc m is suc n + m. Reiterating with a table makes this less confusing

P Vector A
x n + suc m
y suc n + m
x ≡ y n+m+1≡n+1+m n m : n + suc m ≡ suc n + m
P x Vector A (n + suc m)
P y Vector A (suc n + m)
p reverse’ xs (x ∷ rxs)

Thus the p we get out is a proof of P y = Vector A (suc n + m) as required.

We’re finally left to show that n + 0 is n. Given that we’ll need the commutativity of addition for problem 6 anyway we may as well take this approach. To show that n + m ≡ m + n we proceed by induction on n (and in the base case n=0 by induction on m).

+-comm : ∀ n m → n + m  m + n
+-comm (suc n) m = subst (λ e → suc (m + n)  e) (sym (n+m+1≡n+1+m m n)) (cong suc (+-comm n m))
+-comm 0 (suc m) = cong suc (+-comm 0 m)
+-comm 0 0 = refl

That first case is bit of a mouthful so we’ll break the substitution down with a table

P _≡_ (suc (m + n))
x suc m + n
y m + suc n
x ≡ y sym (n+m+1≡n+1+m m n) : suc m + n ≡ m + suc n
p cong suc (+-comm n m) : suc n + m ≡ suc m + n

The final solution to reverse should now be simple.

module P5 where
open import Data.Nat using (ℕ; suc; zero)
open import Relation.Binary.PropositionalEquality using (_≡_; refl; sym; subst; cong)

n+m+1≡n+1+m : (n m : ) → n + suc m  suc n + m
n+m+1≡n+1+m (suc n) m = cong suc (n+m+1≡n+1+m n m)
n+m+1≡n+1+m 0 m = refl

+-comm : ∀ n m → n + m  m + n
+-comm (suc n) m = subst (λ e → suc (m + n)  e) (sym (n+m+1≡n+1+m m n)) (cong suc (+-comm n m))
+-comm 0 (suc m) = cong suc (+-comm 0 m)
+-comm 0 0 = refl

reverse : ∀{n}{A : Set} → Vector A n → Vector A n
reverse {n} xs = subst (λ k → Vector A k) (+-comm n 0) (reverse' xs [])
  where reverse' : ∀{n m}{A : Set} → Vector A n → Vector A m → Vector A (n + m)
        reverse' {suc n}{m} (x  xs) rxs = subst (λ k → Vector A k) (n+m+1≡n+1+m n m) (reverse' xs (x  rxs))
        reverse' [] rxs = rxs

A-99 Lists 4 (*) Find the number of elements of a list

Given that we track the number of elements in our list this problem is trivial.

module P4 where
open import Data.Nat using (ℕ; suc; zero)

data Vector (A : Set) : (n : ) → Setwhere
  [] : Vector A 0
  _∷_ : ∀{n} → A → Vector A n → Vector A (suc n)
infixr 5 _∷_

length : ∀{n}{A : Set} → Vector A n → 
length {n} _ = n

A-99 Lists 3 (*) Find the K’th element of a list

Given that inspecting an arbitrary index in a list can result in a runtime failure (what if we request index 12 of a length 2 list) we have two options

  1. return a datatype with an error code such as Maybe or -⊎-, or
  2. restrict the index by the size of the list.

We choose the second option because it’s not traditionally simple or even possible in many languages.

The type used to express this is Fin, the family of finite sets. As an example the type Fin 4 is the collection of numbers less than 4.

data Fin (A : Set) : Set where
  zero : {n : } → Fin (suc n) 
  suc : {n : } → (i : Fin n) → Fin (suc n)

The first constructor says that for any natural number n, zero is less than n+1 (we add one because zero is not less than zero).

The second constructor asserts that if we have a value less than some n, it’s successor is less than n+1.

Equipped with the above type we can express the problem exactly as one would expect: the function element-at is from a list of size n and an index less than n to an element in the list.

module P3 where
open import Data.Nat using (ℕ; suc; zero)
open import Data.Fin using (Fin; suc; zero)

data Vector (A : Set) : (n : ) → Setwhere
  [] : Vector A 0
  _∷_ : ∀{n} → A → Vector A n → Vector A (suc n)
infixr 5 _∷_

element-at : ∀{A}{n} → Vector A n → Fin n → A
element-at [] ()
element-at (x  xs) zero = x
element-at (x  xs) (suc n) = element-at xs n

Topoi 3.8: Products


  1. <pra, prb> = 1a × b
  2. if <f, g> = <k, h> then f = k and g = h
  3. <f ∘ h, g ∘ h> = <f, g> ∘ h
  4. if category 𝒞 has a terminal object 1 and products, then for any 𝒞-object a, a ≅ a × 1 and indeed <1a, Ia> is iso

The arrow 1a × b by definition satisfies

  1. pra ∘ 1a × b = pra and
  2. prb ∘ 1a × b = prb

and thus <pra, prb> = 1a × b.

Clearly pra ∘ <f, g> = pra ∘ <k, h>, hence f = k, moreover since prb ∘ <f, g> = prb ∘ <k, h>, g = h.

At this point we really only have two options for proving the isomorphism: we could (1) find an iso arrow or (2) show that a is a product of a × 1;. Opting for the second we must show the following diagram commutes for all c, f and g
a is a product of ax1
Notice however that we only have one choice of g, namely !c and regardless of <f, g> we have that !a ∘ <f, g> = !c again by the definition of the terminal arrow. Clearly then taking <f, g> = f gives

  1. 1a ∘ f = f and
  2. !a ∘ f = !c

as required. Given that all products are isomorphic we have that a is isomorphic to a × 1.

A-99 Lists 2 (*) Find the last but one element of a list

Very similar to the first lists problem but here we need to protect against lists under length 2. Also note the use of infixr marking the list constructor _∷_ as a right associative infix operator with precedence 5 (higher precedence binding more tightly).

module P1 where
open import Data.Nat using (ℕ; suc; zero)

data Vector (A : Set) : (n : ) → Setwhere
  [] : Vector A 0
  _∷_ : ∀{n} → A → Vector A n → Vector A (suc n)
infixr 5 _∷_

last-but-one : ∀{A}{n} → Vector A (suc (suc n)) → A
last-but-one (x  _ ∷ []) = x
last-but-one (_  x1  x2  xs) = last-but-one (x1  x2  xs)

A-99 Lists 1 (*) Find the last element of a list

The list problems I’d chosen to solve using a type with known length (often referred to as a vector). The implementation is straightforward enough, the one interesting addition is the type of the first argument to last: Vector A (suc n) which protects the call against empty lists.

module P1 where
open import Data.Nat using (ℕ; suc; zero)

data Vector (A : Set) : (n : ) → Setwhere
  [] : Vector A 0
  _∷_ : ∀{n} → A → Vector A n → Vector A (suc n)

last : ∀{A}{n} → Vector A (suc n) → A
last (x ∷ []) = x
last (_  (x  xs)) = last (x  xs)