Friday, 10 June 2011

4K Galaxy Demo

As another little experiment, I started working on a 4K demo, using the brillaint framework released by iQ on his website: http://iquilezles.org/www/material/isystem1k4k/isystem1k4k.htm

I'm not too sure about what I want to do with this but the main idea was to try and simulate a spiral galaxy made of 500K+ particles and see how nice I can manage to make it look. Plus maybe some attempt at a chiptune style music track to go along.

Regarding the galaxy simulation, my original idea was to create a velocity field and simply advect the particles to make them go round on elliptic orbits and simulate what is calles a density wave, which is apparently what creates the spiral arms. A picture's worth a 1000 words:
(from http://drgoulu.com/)

The corresponding implementation was pretty numerically instable and led to stars either flying off or collapsing into 1 single point after a few seconds of simulation. And I should probably post some of the weird/beautiful results that randomly appeared as my code was buggy here and there ... I would be incapable of creating such visuals from scratch even if I wanted to :)

The instability was coming from the advection step, so I changed my mind about the whole technique and ended up calculating the star's orbit directly in the shader, which is numerically stable.

Here's what I've got so far (sorry for the lowrez video):


And a side benefit, trying to write code that is that minimal is a great learning experience that I definatly recommend.

EDIT: Fixed video link.
-m

Sunday, 5 June 2011

C++ Stat Library

As it's been a little while since the last update, I thought I could share a piece of code I've written recently, namely my stat gathering library.

The motivation for developing this library was to have a set of simple tools I can use to instrument my code and analyse its performance. Also, such a system can also be used for collecting gameplay statistics so I kept that in mind even if this is not the primary goal I was aiming for. And when I say gameplay statistics, I am thinking at something designers could use to tune the game, as opposed to statistics in the sense of "leaderboards" for which this system might be overkill and somewhat ill-suited.

What follows is a description of how the system works. If you prefer, you can jump straight to the source code, available at the bottom of this post. It's all contained in a single .h file for convenience, uses the STL although it would be trivial to replace it with other containers, and works on Win32 so far. The only platform specific bit is the StatGetTime() function you would need to reimplement with whatever real time clock API your platform provides.



Still here? Good :) So, the system can record 3 types of stats:
  • Accumulators: this is the simplest type, code can add/subtract any arbitrary value to it whenever it wants and the systems simply keeps track of the global count. It's not the most useful one for performance analysis but it can be pretty handy for gameplay stats where you way want to bump a stat every time a specific event happens in your game. The only thing you can retrieve from it is the Global Count.
  • Counters: Same as accumulators, except that it gets reset once per frame. This is going to be helpful for tracking how many times something happens per frame, like how many objects are rendered, how many animation bones got updated or how much data was sent over the network. Also, each frame, the total is fed into a history buffer that will allow us to get the recent average (over the last n frames), which can be more interesting than the global average as thing might largely evolve over time. In a combat game for instance, performance is really different when nothing is happening than when 10 enemies are pathing toward you... and you probably want the average CPU usage to mirror that rather than getting the average usage since the game started. On counters you can retrieve:
      • This frame's count
      • Recent average count
      • Global average count
      • Global minimum count
      • Global maximum count
  • Timers: They will record the amount of time elapsed from the moment they get started to when they get stopped. This can happen several time per frame and we keep track of how many time they go through a start/stop process. Every frame, both the count and cumulated time are reset. Timers are perfect for monitoring code execution as they will be able to tell how many time an instrumented function was called per frame, and how much time was spent in it. They also have their history buffers so you get recent both and global averages. On timers you can retrieve:
      • This frame's time
      • Recent average time
      • Global average time
      • Global minimum time
      • Global maximum time
      • This frame's count
      • Recent average count
      • Global average count
      • Global minimum count
      • Global maximum count

Here is a UML diagram of the different classes of this system, with a quick summary of their role:

(Click on the diagram to view a full-size version)

  • StatBase: Base class for Accumulators, Counters and Timers. This allows to batch manage them in a StatManager
  • StatHistory: template class that will store the last N samples and compute the min/max/global and recent average
  • StatAccumulator: The accumulator described above
  • StatCounter: The counter described above. Has a StatHistory for the buffering its count
  • StatTimer: The timer described above. Has 2 StatHistory objects, one for the count, one for the elapsed time
  • StatTimerScope: Wrapper around a StatTimer that will start its associated timer at construction and stop it at destruction. Useful for timing functions.
  • StatManager: Keeps track of a bunch of stats and updates them every frame. Also responsible of creating/destroying the stats. Note that a StatManager is not a singleton so you can have multiple ones. Typically you could have one for performance analysis and one for gameplay stats.
  • StatProcessor: Functor that can be applied on a collection of stats to process them. Like rendering them on screen for instance.
  • StatTextDump: A StatProcessor that will list all the stats and dump them on the standard output.

For convenience, you can use the following macros to instrument your code:
  • STAT_COUNT( manager, stat_name, value )
  • STAT_ACCUMULATE( manager, stat_name, value )
  • STAT_START( manager, stat_name )
  • STAT_STOP( manager, stat_name )
  • STAT_SCOPED_TIMER( manager, stat_name )

A typical usage would be something like:


void main()
{
 ...
 
 StatManager stats.Init();

 {
  STAT_SCOPED_TIMER( stats, "Init Data" );
  ...
 }

 for( ... )
 {
  STAT_ACCUMULATE( stats, "Loops count", 1 );
  RadixSort::Sort( ... );
  stats.Update(); 
 }
 
 StatTextDump statDump;
 statDump.PrintHeader();
 statDump.ProcessStats( stats.GetStats() );
 
 stats.Term();
}

And this would produce the following output:


                     |    Cpt  | RctAvgCpt  |  m Cpt  |  M Cpt  | GlblAvgCpt |    RctAvgT    |      m T      |      M T      |    GlblAvgT   |
---------------------+---------+------------+---------+---------+------------+---------------+---------------+---------------+---------------|
CalcHistograms       |      -  |     4.000  |      4  |      4  |     4.000  |    40.215 ms  |    39.129 ms  |    62.043 ms  |    42.222 ms  |
CalcOffsets          |      -  |     4.000  |      4  |      4  |     4.000  |     0.030 ms  |     0.030 ms  |     0.074 ms  |     0.031 ms  |
Init Data            |      -  |     0.000  |      0  |      1  |     0.020  |     0.000 ms  |     0.000 ms  |    65.029 ms  |     1.301 ms  |
Loops count          |     50  |         -  |      -  |      -  |         -  |         - ms  |         - ms  |         - ms  |         - ms  |
Sort                 |      -  |     1.000  |      1  |      1  |     1.000  |   118.502 ms  |   107.572 ms  |   281.035 ms  |   138.777 ms  |
SplitInput           |      -  |     1.000  |      1  |      1  |     1.000  |     0.002 ms  |     0.002 ms  |     0.003 ms  |     0.002 ms  |

It is important to note that the current implementation is not thread-safe and would probably break in various ways. Imagine that you are monitoring a threadproc like this one:

static DWORD WINAPI SortChunkThreadProc( LPVOID _pParams )
{
 STAT_SCOPED_TIMER( stats, "SortThreadProc" );
  
 SortChunk( ... );

 return TRUE;
}

If more than 1 thread at a time is running this function, the static StatTimer object created by the macro is going to be accessed from different threads without any particular precaution and will at best report garbage and most likely crash.


This is something I intend to fix the next iteration of this little library and is quite an interesting problem per se. In the meantime, and hoping that you'll find all that helpful, have fun !!

Attachments:
-m

Tuesday, 15 March 2011

Some progress on Nonogramm

I resumed working on Nonogramm a couple of days ago and it's been going really well. First task was to migrate to XNA4 as you cannot ship XNA3 projects on XBLIG anymore.

Next, I resumed implementing the different UI pages needed for the game - most of the project is UI work anyway. The following video shows the current status where you can access a list of puzzles (randomly generated so far) and play a puzzle.


By the way, I started posting updates on Twitter as well, just search for #Nonogramm !

This is all for now!

Sunday, 27 February 2011

A way to avoid virtual function calls in C++

A couple of days ago, a colleague sent this link to a paper called "Executable Bloat - How it happens and how we can fight it" which was an interesting read. The point about the cost of virtual function calls and the fact that you should try to avoid them reminded me of a bit of C++ code I wrote a few years ago that tried to achieve exactly that.

The idea is to transform virtual updates into non virtual updates using templates. You go from 1 virtual update per object to either no virtual call at all or 1 virtual call per class, depending on how much performance you want to trade for flexibility.

In lots of game engines, calling virtual functions once per frame on objects is a common pattern. Such objects are generally handled in a manager that will process the each frame. I have seen this pattern in lots of codebases, no matter what style of objects hierarchies was used (either OOP-like with deep and complex class trees or more Data-Oriented entity/components systems).

A typical, although a bit naive, example of such a pattern could a renderer that would draw 3D objects. In it, you would find a base interface for everything that can be rendered (IRenderable), some concrete classes that implement this interface (Mesh and ParticleSystem) and a manager, the renderer itself, that would store the objects that needs to be drawn. Here is some pseudo-code:

//////////////////////////////////
// Rendering system
//////////////////////////////////

class IRenderable
{
public:
    virtual void Render() = 0;
};

class Mesh : public IRenderable
{
public:
    virtual void Render()
    { /* Do rendery things */ }
};

class ParticleSystem : public IRenderable
{
public:
    virtual void Render()
    { /* Do rendery things */ }
};


class RenderManager
{
private:
    std::vector< IRenderable * >    m_Objects;

public:
    void RenderAll()
    {
        std::vector< IRenderable * > iter = m_Objects.begin();
        std::vector< IRenderable * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();  // 1 virtual call per object
        }
    }
};


//////////////////////////////////
// Main
//////////////////////////////////

RenderManager gRenderManager;

void main()
{
    while(1)
    {
        gRenderManager.RenderAll();
    }
}

With such a design, you get 1 virtual call per object. Depending on the the platform, the content of the list and some other factors, these virtual calls can become quite a significant performance hit because of cache misses.

In order to get rid of the virtual calls, we will declare 1 render manager per type of Renderable, using templates. This transforms the dispatching of the call to Render() from virtual to static. WIN!

Here's the new code:

//////////////////////////////////
// Rendering system
//////////////////////////////////

class Mesh                // No base interface anymore
{
    void Render()        // Not virtual anymore
    { /* Do rendery things */ }
};

class ParticleSystem    // No base interface anymore
{
    void Render()        // Not virtual anymore
    { /* Do rendery things */ }
};

template< class _T >
class RenderManager
{
private:
    std::vector< _T * >       m_Objects;

public:
    void RenderAll()
    {
        std::vector< _T * > iter = m_Objects.begin();
        std::vector< _T * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();   // this call is NOT virtual
        }
    }
};

//////////////////////////////////
// Main
//////////////////////////////////

RenderManager< Mesh > gRenderManager_Mesh;
RenderManager< ParticleSystem > gRenderManager_ParticleSystem;

void main()
{
    while(1)
    {
        gRenderManager_Mesh.RenderAll();
        gRenderManager_ParticleSystem.RenderAll();
    }
}

The change from virtual to static dispatch comes with a small price though. You now have to call RenderAll() on all the different managers, which can get tedious and bug prone. To improve that, we can reintroduce polymorphism, but this time we'll do it at the manager's level. So in effect we transformed 1 virtual call per instance into 1 virtual call per class, which is likely to be a few orders of magnitude smaller.

/////////////////////////////////
// Rendering system
//////////////////////////////////

class Mesh
{
    void Render()
    { /* Do rendery things */ }
};

class ParticleSystem
{
    void Render()
    { /* Do rendery things */ }
};


class IRenderManager    // Base class for all managers
{
public:
    virtual void RenderAll() = 0;
};


template< class _T >
class RenderManager : public IRenderManager        // all template mamangers inherit from the interface
{
private:
    std::vector< _T * >       m_Objects;

public:
    void RenderAll()
    {
        std::vector< _T * > iter = m_Objects.begin();
        std::vector< _T * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();   // this call is NOT virtual
        }
    }
};


//////////////////////////////////
// Main
//////////////////////////////////

RenderManager< Mesh > gRenderManager_Mesh;
RenderManager< ParticleSystem > gRenderManager_ParticleSystem;

std::vector< IRenderManager * > gRenderManagers;        // will store all the managers

void main()
{
    // at init time, add the managers into an array so we can iterate over them
    // ordering is explicit, which is a nice thing (think opaque/transparent objects for instance)
    gRenderManagers.push_back(&gRenderManager_Mesh);
    gRenderManagers.push_back(&gRenderManager_ParticleSystem);

    while(1)
    {
        std::vector< IRenderManager * > iter = gRenderManagers.begin();
        std::vector< IRenderManager * > iend = gRenderManagers.end();

        for( ; iter != iend; ++iter )
        {
            *iter->RenderAll();
        }
    }
}

In a real world codebase, a set of macros would surely be used for declaring and registering a new manager for each type, making the system easier to maintain and extend.

I think the approach described here works quite well for entity/components systems wit Data-Oriented Design (DOD) in mind. In such systems, your managers would also deal with allocating/deallocating the components in a memory friendly manner.

I hope that you will find all this useful in your projects!
See you next time, or maybe at GDC next week!

-m

#AltDevBlogADay

I just wanted to point out to whoever reads this blog that he should probably be checking http://altdevblogaday.org/. Lots of quality content there!!

Saturday, 15 January 2011

2011 !

Not much going on these days programming-wise as I'm spendig lots of time playing Gypsy Jazz guitar with some of my friends.
Just thought that I'd pass my wishes though :)


-m

Tuesday, 23 November 2010

Progress on the Main Game UI

A really brief update on the UI that will be used in the gameplay screen. Nothing very special to report, I am working my way through XNA and I must admit that while some stuff is extra cool, some bits are getting a bit annoying like trying to use the SpriteBatch class with custom effect. This is more complex than it should be and although you can find some way around the issues and it is rather documented on te web, you just get the feeling that it could be easier than that. It is woth mentionning that I am still using XNA 3.1 and maybe these issues are fixed in 4.0. I'll upgrate to the latest version a some point in time but so far I am glad to be able tu keep using VS 2008 as I am really used to it (XNA 4.0 doesn't install on pre VS2010 to my knowledge).
Also, I am getting a bit worried by the performances, mainly because of the gaussian blur shader that I use in a few places. I could repalce that by a Poisson disk blur filter which I may do after the game is implemented. The way I use the SpriteBatch also seems really inefficient so hopefully there will be plenty of room for optimization without struggling that much.

Here is a video of how it looks so far:


-m

Friday, 19 November 2010

More work on the UI

Some visual improvements, plus another, more complex UI page.
This is all coming along very well and I am uite pleased and excited :)


-m

Monday, 15 November 2010

First version of the Game UI

I have been working on the UI system for my XNA game since most of the added value will be a neat presentation layer. I came up with a fairly simple UI system that builds up on my state machine class I talked about last time. The whole thing is rather straightworward to use. Here is a quick video of it in action:



One of the interesting thing about this UI framework is the ability for any widget to specify its neighours, so when the focus moves, the UI page can figure out who should recieve the focus. Here is a snippet from the UIWidget class which is base for all widgets:

   public class UIWidget : StateMachine
   {
       ...
       public UIWidget LeftNeighbour
      {
           get;
           set;
       }
      public UIWidget RightNeighbour
       {
           get;
           set;
       }
       public UIWidget UpNeighbour
       {
           get;
           set;
       }
       public UIWidget DownNeighbour
       {
           get;
           set;
       }
       ...
   }

Widgets are laid out on UIPage objects. A UIPage keeps track of the focussed widget and processes pad inputs. When recieving a pad message, it picks the right neibour from the focussed widget and transfers focus to it. Note that multiple controls can have the same neighbour for a particular direction, so you can create rather complex layouts with a bit of work.

That's all for today!

Friday, 12 November 2010

C# State Machine class

For my little XNA game that I am developping at the moment, the main thing I needed was a simple Finite State Machine class that I could use as the basis for prtty much any game object, especially for all the GUI and GameState logic.

I didn't want to spend too much time on a framework or grab one of the many libraries available here and there and then spend some time learning how to use it. Instead, I had the feeling that using the reflection capabilities of C#, I could easily and quickly come up with a simple piece of code that would fulfill my needs for this project. Functionnality-wise, my inspiration was Steve Rabin's article about a set of C++ macros for writing simple state machines. I used them quite a lot in the past and liked the fact the simple feature set, perfectly suited for small scale projects.

Basically, a State Machine can be in a certain number of differnt states and transition from one to another. Each state has:
  • an Enter() function callled when this state becomes active. This is the perfect place to do per-state setups
  • an Update() function that should be called once per frame
  • an Exit() function, called when leaving a state for another one. This is the perfect place to do some cleanup before leaving the state
State and handlers are defined via a simple naming convention that applies everywhere:
   StateName__Handler(...)


As a quick example to make that obvius, let's have a look at the Default state (which does nothing at all):
   public virtual void Default__Enter(string _OldState){}
   public virtual void Default__Exit(string _NewState){}
   public virtual void Default__Update(GameTime _GameTime){}



On top of that, each state can define message handlers for any message it wants to support. Message are defined by a simple string. For instance, the handler for a message called TakeDamage in the Default state and parametrized with an amount of damage points would imply the following bits:

   public virtual void Default__TakeDamage(int _Damage)
   {
      Health -= _Damage;
   }

And game code would send the message like this:
   Target.ProcessMessage( "TakeDamage", 10 );

As you can see this is all really simple. But here comes the goodness :)
  • You can have handlers that handle the same message differently per state:
   public virtual void Default__TakeDamage(int _Damage)
   {
      Health -= _Damage;

   }

   public virtual void Invincible__TakeDamage(int _Damage)
   {
   }
  • You can derive a state machine class from an existing class and override handlers so they do different things:
   class Soldier : StateMachine
   {
      public virtual void Default__TakeDamage(int _Damage)
      {
         Health -= _Damage;

      }
   }

   class SoldierFragile : Soldier
   {
      public virtual void Default__TakeDamage(int _Damage)
      {
         Health -= _Damage * 5;

      }
   }

The implementation might be rather simple and compact but it'still quite powerfull as you can see.

However, nothing is perfect in this world and I am sure some design choices are arguable ... but they suit my needs so I'm fine with it. I think the major concerns would be:
  • Weak state referencing as it is all done by strings and naming convention. So if you write typos or rename stuff around, this is likely to break. More error handling could be added but ultimately, this is just a string and this is rather fragile.
  • State specific variables will live at instance scope, not state scope. This is because I want to be able to acces the state machine members from anystate (this reduces the nb of accessors and stuff you need to write)
If you are interested in giving it a try or looking how it is done, you can find the source code with a stupid test case here. Note that I personally use it with XNA but this is rather generic and could be used without XNA. In fact, the provided implementation doesn't use XNA although it is straightforward to adapt it. In XNA, the State machine class should derive fron GameComponent.

Feedback welcome !

-m