This page is intended to be a rapid overview of how OO features like inheritance and polymorphism can be implemented in C++.

1 Running example

We’ll use the following toy example throughout this writeup:

#include <stdio.h>

class base {
public:
    double length; /// in meters
    double ftLength() { return length / 0.3048; } /// in feet

    base(double len) { this->length = len; } // constructor

    virtual double size() { return length; }
};

class derived : public base {
public:
    double height; /// in meters
    double ftHeight() { return height / 0.3048; } /// in feet

    derived(double len, double hei) : base(len) { this->height = hei; } // constructor

    virtual double size() { return length * height; }
    virtual int dimensions() { return 2; }
};

int main() {
    base *b = new base(3);
    base *d = new derived(4, 5);
    printf("lengths:  %g and %g\n", b->length, d->length);
    printf("imperial: %g and %g\n", b->ftLength(), d->ftLength());
    printf("sizes:    %g and %g\n", b->size(), d->size());
}

This is arguably a bad example from an OO standpoint (we made the fields public, and size doesn’t even pass dimensional analysis) and a C++ standpoint (I wrote the constructors without initializer lists and use printf instead of cout), but it will serve us well from a how C++ works standpoint.

Just for reference, the code prints

lengths:  3 and 4
imperial: 9.84252 and 13.1234
sizes:    3 and 20

2 Storing attributes

Note that we have declared both b and d to be base *, even though d actually points to a derived. This is handled by the simple expedient of laying out each derived with the attributes of base in the front:

Memory layout of objects
Offset Base Derived b d
0–7 vtable vtable vtable vtable
8–15 length length 3.0 4.0
16–23 (nothing) width 5.0

Thus, when we assemble our code and d->length turns into something like n(%rdx) where %rdx contains value of d, we arrive at the length attribute as if d was a base *, with no extra work needed.

The special vtable entries above will are explained in more detail under Virtual functions.

3 Non-virtual functions

Non-virtual functions like ftLength are turned into a single global function, just like any other function but with a first parameter in the assembly of base *this inserted automatically by the compiler. Thus the assembly for ftLength looks like

movsd   8(%rdi), %xmm0
divsd   const3048(%rip), %xmm0
ret

In other words, %rdi (the first argument) is the base *this, length is at offset 8 and moved into one of the floating-point registers (%xmm0), the bytes of the constant 0.3048 are stored instruction-pointer-relative label const3048 and used to divide, which leaves the result in the correct register to return. Because length is at the same offset for both base and derived, this works for both base *this and derived *this.

4 Virtual functions

Non-virtual functions are stored in the global address space, and thus cannot be overridden in subclasses. C++ allows overriding by using virtual functions.

Conceptually, in C++ every class’s first member is a pointer to a struct called a vtable with only function-pointer attributes. There’s just one copy of each such struct in memory, stored in the read-only globals. The one for base looks like

base’s vtable
Offset Member name Points to
0–7 size base::size

and the one for derived looks like

derived’s vtable
Offset Member name Points to
0–7 size derived::size
8–15 dimensions derived::dimensions

Thus there are three functions in the assembly (base::size, derived::size, and derived::dimensions) and each object has one extra attribute:

Memory layout of objects
Offset b d
0–7 pointer to base’s vtable pointer to derived’s vtable
8–15 3.0 4.0
16–23 5.0

The compiler then translates d->size() into the equivalent of what in C we might write as d->vtable->size(d)

I’m not sure the above is the best explanation. I wrote a different one too, a portion of which I’ve included here in case you prefer it.

In C’ we’d write the virtual functions and vtables as something like

/// the three functions
double base_size(base *this) { return this->length; }
double derived_size(derived *this) { return this->length * this->width; }
int derived_dimensions(derived *this) { return 2; }

/// base
struct base;
struct derived;

/// the three functions
double base_size(struct base *this);
double derived_size(struct derived *this);
int derived_dimensions(struct derived *this);

/// base
struct  {
    double (*size)(struct base *);
} base_vtable = { base_size };

typedef struct base {
    void **vtable;
    double length;
} base;

base *base_constructor(double len) {
    base *ans = malloc(sizeof(base));
    ans->vtable = (void *)&base_vtable;
    ans->length = len;
    return ans;
}

/// derived
struct {
    double (*size)(struct derived *);
    int (*dimensions)(struct derived *);
} derived_vtable = { derived_size, derived_dimensions };

typedef struct derived {
    void **vtable;
    double length;
    double width;
} derived;

derived *derived_constructor(double len, double wid) {
    derived *ans = malloc(sizeof(derived));
    ans->vtable = (void *)&derived_vtable;
    ans->length = len;
    ans->width = wid;
    return ans;
}

double base_size(struct base *this) { return this->length; }
double derived_size(struct derived *this) { return this->length * this->width; }
int derived_dimensions(struct derived *this) { return 2; }


/// how we call virtual functions
double invoke_size(base *b) {
    //       cast as function   1st vtable entry   argument
    return ((double (*)(base *)) (b->vtable[0]))      (b);
}

The above basically works, but I found trying to write why (and handle the ugliness of function pointer syntax) painful, so I re-wrote it with the main text of this page.