Friday 10 June 2011

4K Galaxy Demo

As another little experiment, I started working on a 4K demo, using the brillaint framework released by iQ on his website: http://iquilezles.org/www/material/isystem1k4k/isystem1k4k.htm

I'm not too sure about what I want to do with this but the main idea was to try and simulate a spiral galaxy made of 500K+ particles and see how nice I can manage to make it look. Plus maybe some attempt at a chiptune style music track to go along.

Regarding the galaxy simulation, my original idea was to create a velocity field and simply advect the particles to make them go round on elliptic orbits and simulate what is calles a density wave, which is apparently what creates the spiral arms. A picture's worth a 1000 words:
(from http://drgoulu.com/)

The corresponding implementation was pretty numerically instable and led to stars either flying off or collapsing into 1 single point after a few seconds of simulation. And I should probably post some of the weird/beautiful results that randomly appeared as my code was buggy here and there ... I would be incapable of creating such visuals from scratch even if I wanted to :)

The instability was coming from the advection step, so I changed my mind about the whole technique and ended up calculating the star's orbit directly in the shader, which is numerically stable.

Here's what I've got so far (sorry for the lowrez video):


And a side benefit, trying to write code that is that minimal is a great learning experience that I definatly recommend.

EDIT: Fixed video link.
-m

Sunday 5 June 2011

C++ Stat Library

As it's been a little while since the last update, I thought I could share a piece of code I've written recently, namely my stat gathering library.

The motivation for developing this library was to have a set of simple tools I can use to instrument my code and analyse its performance. Also, such a system can also be used for collecting gameplay statistics so I kept that in mind even if this is not the primary goal I was aiming for. And when I say gameplay statistics, I am thinking at something designers could use to tune the game, as opposed to statistics in the sense of "leaderboards" for which this system might be overkill and somewhat ill-suited.

What follows is a description of how the system works. If you prefer, you can jump straight to the source code, available at the bottom of this post. It's all contained in a single .h file for convenience, uses the STL although it would be trivial to replace it with other containers, and works on Win32 so far. The only platform specific bit is the StatGetTime() function you would need to reimplement with whatever real time clock API your platform provides.



Still here? Good :) So, the system can record 3 types of stats:
  • Accumulators: this is the simplest type, code can add/subtract any arbitrary value to it whenever it wants and the systems simply keeps track of the global count. It's not the most useful one for performance analysis but it can be pretty handy for gameplay stats where you way want to bump a stat every time a specific event happens in your game. The only thing you can retrieve from it is the Global Count.
  • Counters: Same as accumulators, except that it gets reset once per frame. This is going to be helpful for tracking how many times something happens per frame, like how many objects are rendered, how many animation bones got updated or how much data was sent over the network. Also, each frame, the total is fed into a history buffer that will allow us to get the recent average (over the last n frames), which can be more interesting than the global average as thing might largely evolve over time. In a combat game for instance, performance is really different when nothing is happening than when 10 enemies are pathing toward you... and you probably want the average CPU usage to mirror that rather than getting the average usage since the game started. On counters you can retrieve:
      • This frame's count
      • Recent average count
      • Global average count
      • Global minimum count
      • Global maximum count
  • Timers: They will record the amount of time elapsed from the moment they get started to when they get stopped. This can happen several time per frame and we keep track of how many time they go through a start/stop process. Every frame, both the count and cumulated time are reset. Timers are perfect for monitoring code execution as they will be able to tell how many time an instrumented function was called per frame, and how much time was spent in it. They also have their history buffers so you get recent both and global averages. On timers you can retrieve:
      • This frame's time
      • Recent average time
      • Global average time
      • Global minimum time
      • Global maximum time
      • This frame's count
      • Recent average count
      • Global average count
      • Global minimum count
      • Global maximum count

Here is a UML diagram of the different classes of this system, with a quick summary of their role:

(Click on the diagram to view a full-size version)

  • StatBase: Base class for Accumulators, Counters and Timers. This allows to batch manage them in a StatManager
  • StatHistory: template class that will store the last N samples and compute the min/max/global and recent average
  • StatAccumulator: The accumulator described above
  • StatCounter: The counter described above. Has a StatHistory for the buffering its count
  • StatTimer: The timer described above. Has 2 StatHistory objects, one for the count, one for the elapsed time
  • StatTimerScope: Wrapper around a StatTimer that will start its associated timer at construction and stop it at destruction. Useful for timing functions.
  • StatManager: Keeps track of a bunch of stats and updates them every frame. Also responsible of creating/destroying the stats. Note that a StatManager is not a singleton so you can have multiple ones. Typically you could have one for performance analysis and one for gameplay stats.
  • StatProcessor: Functor that can be applied on a collection of stats to process them. Like rendering them on screen for instance.
  • StatTextDump: A StatProcessor that will list all the stats and dump them on the standard output.

For convenience, you can use the following macros to instrument your code:
  • STAT_COUNT( manager, stat_name, value )
  • STAT_ACCUMULATE( manager, stat_name, value )
  • STAT_START( manager, stat_name )
  • STAT_STOP( manager, stat_name )
  • STAT_SCOPED_TIMER( manager, stat_name )

A typical usage would be something like:


void main()
{
 ...
 
 StatManager stats.Init();

 {
  STAT_SCOPED_TIMER( stats, "Init Data" );
  ...
 }

 for( ... )
 {
  STAT_ACCUMULATE( stats, "Loops count", 1 );
  RadixSort::Sort( ... );
  stats.Update(); 
 }
 
 StatTextDump statDump;
 statDump.PrintHeader();
 statDump.ProcessStats( stats.GetStats() );
 
 stats.Term();
}

And this would produce the following output:


                     |    Cpt  | RctAvgCpt  |  m Cpt  |  M Cpt  | GlblAvgCpt |    RctAvgT    |      m T      |      M T      |    GlblAvgT   |
---------------------+---------+------------+---------+---------+------------+---------------+---------------+---------------+---------------|
CalcHistograms       |      -  |     4.000  |      4  |      4  |     4.000  |    40.215 ms  |    39.129 ms  |    62.043 ms  |    42.222 ms  |
CalcOffsets          |      -  |     4.000  |      4  |      4  |     4.000  |     0.030 ms  |     0.030 ms  |     0.074 ms  |     0.031 ms  |
Init Data            |      -  |     0.000  |      0  |      1  |     0.020  |     0.000 ms  |     0.000 ms  |    65.029 ms  |     1.301 ms  |
Loops count          |     50  |         -  |      -  |      -  |         -  |         - ms  |         - ms  |         - ms  |         - ms  |
Sort                 |      -  |     1.000  |      1  |      1  |     1.000  |   118.502 ms  |   107.572 ms  |   281.035 ms  |   138.777 ms  |
SplitInput           |      -  |     1.000  |      1  |      1  |     1.000  |     0.002 ms  |     0.002 ms  |     0.003 ms  |     0.002 ms  |

It is important to note that the current implementation is not thread-safe and would probably break in various ways. Imagine that you are monitoring a threadproc like this one:

static DWORD WINAPI SortChunkThreadProc( LPVOID _pParams )
{
 STAT_SCOPED_TIMER( stats, "SortThreadProc" );
  
 SortChunk( ... );

 return TRUE;
}

If more than 1 thread at a time is running this function, the static StatTimer object created by the macro is going to be accessed from different threads without any particular precaution and will at best report garbage and most likely crash.


This is something I intend to fix the next iteration of this little library and is quite an interesting problem per se. In the meantime, and hoping that you'll find all that helpful, have fun !!

Attachments:
-m

Tuesday 15 March 2011

Some progress on Nonogramm

I resumed working on Nonogramm a couple of days ago and it's been going really well. First task was to migrate to XNA4 as you cannot ship XNA3 projects on XBLIG anymore.

Next, I resumed implementing the different UI pages needed for the game - most of the project is UI work anyway. The following video shows the current status where you can access a list of puzzles (randomly generated so far) and play a puzzle.


By the way, I started posting updates on Twitter as well, just search for #Nonogramm !

This is all for now!

Sunday 27 February 2011

A way to avoid virtual function calls in C++

A couple of days ago, a colleague sent this link to a paper called "Executable Bloat - How it happens and how we can fight it" which was an interesting read. The point about the cost of virtual function calls and the fact that you should try to avoid them reminded me of a bit of C++ code I wrote a few years ago that tried to achieve exactly that.

The idea is to transform virtual updates into non virtual updates using templates. You go from 1 virtual update per object to either no virtual call at all or 1 virtual call per class, depending on how much performance you want to trade for flexibility.

In lots of game engines, calling virtual functions once per frame on objects is a common pattern. Such objects are generally handled in a manager that will process the each frame. I have seen this pattern in lots of codebases, no matter what style of objects hierarchies was used (either OOP-like with deep and complex class trees or more Data-Oriented entity/components systems).

A typical, although a bit naive, example of such a pattern could a renderer that would draw 3D objects. In it, you would find a base interface for everything that can be rendered (IRenderable), some concrete classes that implement this interface (Mesh and ParticleSystem) and a manager, the renderer itself, that would store the objects that needs to be drawn. Here is some pseudo-code:

//////////////////////////////////
// Rendering system
//////////////////////////////////

class IRenderable
{
public:
    virtual void Render() = 0;
};

class Mesh : public IRenderable
{
public:
    virtual void Render()
    { /* Do rendery things */ }
};

class ParticleSystem : public IRenderable
{
public:
    virtual void Render()
    { /* Do rendery things */ }
};


class RenderManager
{
private:
    std::vector< IRenderable * >    m_Objects;

public:
    void RenderAll()
    {
        std::vector< IRenderable * > iter = m_Objects.begin();
        std::vector< IRenderable * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();  // 1 virtual call per object
        }
    }
};


//////////////////////////////////
// Main
//////////////////////////////////

RenderManager gRenderManager;

void main()
{
    while(1)
    {
        gRenderManager.RenderAll();
    }
}

With such a design, you get 1 virtual call per object. Depending on the the platform, the content of the list and some other factors, these virtual calls can become quite a significant performance hit because of cache misses.

In order to get rid of the virtual calls, we will declare 1 render manager per type of Renderable, using templates. This transforms the dispatching of the call to Render() from virtual to static. WIN!

Here's the new code:

//////////////////////////////////
// Rendering system
//////////////////////////////////

class Mesh                // No base interface anymore
{
    void Render()        // Not virtual anymore
    { /* Do rendery things */ }
};

class ParticleSystem    // No base interface anymore
{
    void Render()        // Not virtual anymore
    { /* Do rendery things */ }
};

template< class _T >
class RenderManager
{
private:
    std::vector< _T * >       m_Objects;

public:
    void RenderAll()
    {
        std::vector< _T * > iter = m_Objects.begin();
        std::vector< _T * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();   // this call is NOT virtual
        }
    }
};

//////////////////////////////////
// Main
//////////////////////////////////

RenderManager< Mesh > gRenderManager_Mesh;
RenderManager< ParticleSystem > gRenderManager_ParticleSystem;

void main()
{
    while(1)
    {
        gRenderManager_Mesh.RenderAll();
        gRenderManager_ParticleSystem.RenderAll();
    }
}

The change from virtual to static dispatch comes with a small price though. You now have to call RenderAll() on all the different managers, which can get tedious and bug prone. To improve that, we can reintroduce polymorphism, but this time we'll do it at the manager's level. So in effect we transformed 1 virtual call per instance into 1 virtual call per class, which is likely to be a few orders of magnitude smaller.

/////////////////////////////////
// Rendering system
//////////////////////////////////

class Mesh
{
    void Render()
    { /* Do rendery things */ }
};

class ParticleSystem
{
    void Render()
    { /* Do rendery things */ }
};


class IRenderManager    // Base class for all managers
{
public:
    virtual void RenderAll() = 0;
};


template< class _T >
class RenderManager : public IRenderManager        // all template mamangers inherit from the interface
{
private:
    std::vector< _T * >       m_Objects;

public:
    void RenderAll()
    {
        std::vector< _T * > iter = m_Objects.begin();
        std::vector< _T * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();   // this call is NOT virtual
        }
    }
};


//////////////////////////////////
// Main
//////////////////////////////////

RenderManager< Mesh > gRenderManager_Mesh;
RenderManager< ParticleSystem > gRenderManager_ParticleSystem;

std::vector< IRenderManager * > gRenderManagers;        // will store all the managers

void main()
{
    // at init time, add the managers into an array so we can iterate over them
    // ordering is explicit, which is a nice thing (think opaque/transparent objects for instance)
    gRenderManagers.push_back(&gRenderManager_Mesh);
    gRenderManagers.push_back(&gRenderManager_ParticleSystem);

    while(1)
    {
        std::vector< IRenderManager * > iter = gRenderManagers.begin();
        std::vector< IRenderManager * > iend = gRenderManagers.end();

        for( ; iter != iend; ++iter )
        {
            *iter->RenderAll();
        }
    }
}

In a real world codebase, a set of macros would surely be used for declaring and registering a new manager for each type, making the system easier to maintain and extend.

I think the approach described here works quite well for entity/components systems wit Data-Oriented Design (DOD) in mind. In such systems, your managers would also deal with allocating/deallocating the components in a memory friendly manner.

I hope that you will find all this useful in your projects!
See you next time, or maybe at GDC next week!

-m

#AltDevBlogADay

I just wanted to point out to whoever reads this blog that he should probably be checking http://altdevblogaday.org/. Lots of quality content there!!

Saturday 15 January 2011

2011 !

Not much going on these days programming-wise as I'm spendig lots of time playing Gypsy Jazz guitar with some of my friends.
Just thought that I'd pass my wishes though :)


-m