Sunday 27 February 2011

A way to avoid virtual function calls in C++

A couple of days ago, a colleague sent this link to a paper called "Executable Bloat - How it happens and how we can fight it" which was an interesting read. The point about the cost of virtual function calls and the fact that you should try to avoid them reminded me of a bit of C++ code I wrote a few years ago that tried to achieve exactly that.

The idea is to transform virtual updates into non virtual updates using templates. You go from 1 virtual update per object to either no virtual call at all or 1 virtual call per class, depending on how much performance you want to trade for flexibility.

In lots of game engines, calling virtual functions once per frame on objects is a common pattern. Such objects are generally handled in a manager that will process the each frame. I have seen this pattern in lots of codebases, no matter what style of objects hierarchies was used (either OOP-like with deep and complex class trees or more Data-Oriented entity/components systems).

A typical, although a bit naive, example of such a pattern could a renderer that would draw 3D objects. In it, you would find a base interface for everything that can be rendered (IRenderable), some concrete classes that implement this interface (Mesh and ParticleSystem) and a manager, the renderer itself, that would store the objects that needs to be drawn. Here is some pseudo-code:

//////////////////////////////////
// Rendering system
//////////////////////////////////

class IRenderable
{
public:
    virtual void Render() = 0;
};

class Mesh : public IRenderable
{
public:
    virtual void Render()
    { /* Do rendery things */ }
};

class ParticleSystem : public IRenderable
{
public:
    virtual void Render()
    { /* Do rendery things */ }
};


class RenderManager
{
private:
    std::vector< IRenderable * >    m_Objects;

public:
    void RenderAll()
    {
        std::vector< IRenderable * > iter = m_Objects.begin();
        std::vector< IRenderable * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();  // 1 virtual call per object
        }
    }
};


//////////////////////////////////
// Main
//////////////////////////////////

RenderManager gRenderManager;

void main()
{
    while(1)
    {
        gRenderManager.RenderAll();
    }
}

With such a design, you get 1 virtual call per object. Depending on the the platform, the content of the list and some other factors, these virtual calls can become quite a significant performance hit because of cache misses.

In order to get rid of the virtual calls, we will declare 1 render manager per type of Renderable, using templates. This transforms the dispatching of the call to Render() from virtual to static. WIN!

Here's the new code:

//////////////////////////////////
// Rendering system
//////////////////////////////////

class Mesh                // No base interface anymore
{
    void Render()        // Not virtual anymore
    { /* Do rendery things */ }
};

class ParticleSystem    // No base interface anymore
{
    void Render()        // Not virtual anymore
    { /* Do rendery things */ }
};

template< class _T >
class RenderManager
{
private:
    std::vector< _T * >       m_Objects;

public:
    void RenderAll()
    {
        std::vector< _T * > iter = m_Objects.begin();
        std::vector< _T * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();   // this call is NOT virtual
        }
    }
};

//////////////////////////////////
// Main
//////////////////////////////////

RenderManager< Mesh > gRenderManager_Mesh;
RenderManager< ParticleSystem > gRenderManager_ParticleSystem;

void main()
{
    while(1)
    {
        gRenderManager_Mesh.RenderAll();
        gRenderManager_ParticleSystem.RenderAll();
    }
}

The change from virtual to static dispatch comes with a small price though. You now have to call RenderAll() on all the different managers, which can get tedious and bug prone. To improve that, we can reintroduce polymorphism, but this time we'll do it at the manager's level. So in effect we transformed 1 virtual call per instance into 1 virtual call per class, which is likely to be a few orders of magnitude smaller.

/////////////////////////////////
// Rendering system
//////////////////////////////////

class Mesh
{
    void Render()
    { /* Do rendery things */ }
};

class ParticleSystem
{
    void Render()
    { /* Do rendery things */ }
};


class IRenderManager    // Base class for all managers
{
public:
    virtual void RenderAll() = 0;
};


template< class _T >
class RenderManager : public IRenderManager        // all template mamangers inherit from the interface
{
private:
    std::vector< _T * >       m_Objects;

public:
    void RenderAll()
    {
        std::vector< _T * > iter = m_Objects.begin();
        std::vector< _T * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();   // this call is NOT virtual
        }
    }
};


//////////////////////////////////
// Main
//////////////////////////////////

RenderManager< Mesh > gRenderManager_Mesh;
RenderManager< ParticleSystem > gRenderManager_ParticleSystem;

std::vector< IRenderManager * > gRenderManagers;        // will store all the managers

void main()
{
    // at init time, add the managers into an array so we can iterate over them
    // ordering is explicit, which is a nice thing (think opaque/transparent objects for instance)
    gRenderManagers.push_back(&gRenderManager_Mesh);
    gRenderManagers.push_back(&gRenderManager_ParticleSystem);

    while(1)
    {
        std::vector< IRenderManager * > iter = gRenderManagers.begin();
        std::vector< IRenderManager * > iend = gRenderManagers.end();

        for( ; iter != iend; ++iter )
        {
            *iter->RenderAll();
        }
    }
}

In a real world codebase, a set of macros would surely be used for declaring and registering a new manager for each type, making the system easier to maintain and extend.

I think the approach described here works quite well for entity/components systems wit Data-Oriented Design (DOD) in mind. In such systems, your managers would also deal with allocating/deallocating the components in a memory friendly manner.

I hope that you will find all this useful in your projects!
See you next time, or maybe at GDC next week!

-m

#AltDevBlogADay

I just wanted to point out to whoever reads this blog that he should probably be checking http://altdevblogaday.org/. Lots of quality content there!!