Showing posts with label engine. Show all posts
Showing posts with label engine. Show all posts

Friday, 10 June 2011

4K Galaxy Demo

As another little experiment, I started working on a 4K demo, using the brillaint framework released by iQ on his website: http://iquilezles.org/www/material/isystem1k4k/isystem1k4k.htm

I'm not too sure about what I want to do with this but the main idea was to try and simulate a spiral galaxy made of 500K+ particles and see how nice I can manage to make it look. Plus maybe some attempt at a chiptune style music track to go along.

Regarding the galaxy simulation, my original idea was to create a velocity field and simply advect the particles to make them go round on elliptic orbits and simulate what is calles a density wave, which is apparently what creates the spiral arms. A picture's worth a 1000 words:
(from http://drgoulu.com/)

The corresponding implementation was pretty numerically instable and led to stars either flying off or collapsing into 1 single point after a few seconds of simulation. And I should probably post some of the weird/beautiful results that randomly appeared as my code was buggy here and there ... I would be incapable of creating such visuals from scratch even if I wanted to :)

The instability was coming from the advection step, so I changed my mind about the whole technique and ended up calculating the star's orbit directly in the shader, which is numerically stable.

Here's what I've got so far (sorry for the lowrez video):


And a side benefit, trying to write code that is that minimal is a great learning experience that I definatly recommend.

EDIT: Fixed video link.
-m

Sunday, 5 June 2011

C++ Stat Library

As it's been a little while since the last update, I thought I could share a piece of code I've written recently, namely my stat gathering library.

The motivation for developing this library was to have a set of simple tools I can use to instrument my code and analyse its performance. Also, such a system can also be used for collecting gameplay statistics so I kept that in mind even if this is not the primary goal I was aiming for. And when I say gameplay statistics, I am thinking at something designers could use to tune the game, as opposed to statistics in the sense of "leaderboards" for which this system might be overkill and somewhat ill-suited.

What follows is a description of how the system works. If you prefer, you can jump straight to the source code, available at the bottom of this post. It's all contained in a single .h file for convenience, uses the STL although it would be trivial to replace it with other containers, and works on Win32 so far. The only platform specific bit is the StatGetTime() function you would need to reimplement with whatever real time clock API your platform provides.



Still here? Good :) So, the system can record 3 types of stats:
  • Accumulators: this is the simplest type, code can add/subtract any arbitrary value to it whenever it wants and the systems simply keeps track of the global count. It's not the most useful one for performance analysis but it can be pretty handy for gameplay stats where you way want to bump a stat every time a specific event happens in your game. The only thing you can retrieve from it is the Global Count.
  • Counters: Same as accumulators, except that it gets reset once per frame. This is going to be helpful for tracking how many times something happens per frame, like how many objects are rendered, how many animation bones got updated or how much data was sent over the network. Also, each frame, the total is fed into a history buffer that will allow us to get the recent average (over the last n frames), which can be more interesting than the global average as thing might largely evolve over time. In a combat game for instance, performance is really different when nothing is happening than when 10 enemies are pathing toward you... and you probably want the average CPU usage to mirror that rather than getting the average usage since the game started. On counters you can retrieve:
      • This frame's count
      • Recent average count
      • Global average count
      • Global minimum count
      • Global maximum count
  • Timers: They will record the amount of time elapsed from the moment they get started to when they get stopped. This can happen several time per frame and we keep track of how many time they go through a start/stop process. Every frame, both the count and cumulated time are reset. Timers are perfect for monitoring code execution as they will be able to tell how many time an instrumented function was called per frame, and how much time was spent in it. They also have their history buffers so you get recent both and global averages. On timers you can retrieve:
      • This frame's time
      • Recent average time
      • Global average time
      • Global minimum time
      • Global maximum time
      • This frame's count
      • Recent average count
      • Global average count
      • Global minimum count
      • Global maximum count

Here is a UML diagram of the different classes of this system, with a quick summary of their role:

(Click on the diagram to view a full-size version)

  • StatBase: Base class for Accumulators, Counters and Timers. This allows to batch manage them in a StatManager
  • StatHistory: template class that will store the last N samples and compute the min/max/global and recent average
  • StatAccumulator: The accumulator described above
  • StatCounter: The counter described above. Has a StatHistory for the buffering its count
  • StatTimer: The timer described above. Has 2 StatHistory objects, one for the count, one for the elapsed time
  • StatTimerScope: Wrapper around a StatTimer that will start its associated timer at construction and stop it at destruction. Useful for timing functions.
  • StatManager: Keeps track of a bunch of stats and updates them every frame. Also responsible of creating/destroying the stats. Note that a StatManager is not a singleton so you can have multiple ones. Typically you could have one for performance analysis and one for gameplay stats.
  • StatProcessor: Functor that can be applied on a collection of stats to process them. Like rendering them on screen for instance.
  • StatTextDump: A StatProcessor that will list all the stats and dump them on the standard output.

For convenience, you can use the following macros to instrument your code:
  • STAT_COUNT( manager, stat_name, value )
  • STAT_ACCUMULATE( manager, stat_name, value )
  • STAT_START( manager, stat_name )
  • STAT_STOP( manager, stat_name )
  • STAT_SCOPED_TIMER( manager, stat_name )

A typical usage would be something like:


void main()
{
 ...
 
 StatManager stats.Init();

 {
  STAT_SCOPED_TIMER( stats, "Init Data" );
  ...
 }

 for( ... )
 {
  STAT_ACCUMULATE( stats, "Loops count", 1 );
  RadixSort::Sort( ... );
  stats.Update(); 
 }
 
 StatTextDump statDump;
 statDump.PrintHeader();
 statDump.ProcessStats( stats.GetStats() );
 
 stats.Term();
}

And this would produce the following output:


                     |    Cpt  | RctAvgCpt  |  m Cpt  |  M Cpt  | GlblAvgCpt |    RctAvgT    |      m T      |      M T      |    GlblAvgT   |
---------------------+---------+------------+---------+---------+------------+---------------+---------------+---------------+---------------|
CalcHistograms       |      -  |     4.000  |      4  |      4  |     4.000  |    40.215 ms  |    39.129 ms  |    62.043 ms  |    42.222 ms  |
CalcOffsets          |      -  |     4.000  |      4  |      4  |     4.000  |     0.030 ms  |     0.030 ms  |     0.074 ms  |     0.031 ms  |
Init Data            |      -  |     0.000  |      0  |      1  |     0.020  |     0.000 ms  |     0.000 ms  |    65.029 ms  |     1.301 ms  |
Loops count          |     50  |         -  |      -  |      -  |         -  |         - ms  |         - ms  |         - ms  |         - ms  |
Sort                 |      -  |     1.000  |      1  |      1  |     1.000  |   118.502 ms  |   107.572 ms  |   281.035 ms  |   138.777 ms  |
SplitInput           |      -  |     1.000  |      1  |      1  |     1.000  |     0.002 ms  |     0.002 ms  |     0.003 ms  |     0.002 ms  |

It is important to note that the current implementation is not thread-safe and would probably break in various ways. Imagine that you are monitoring a threadproc like this one:

static DWORD WINAPI SortChunkThreadProc( LPVOID _pParams )
{
 STAT_SCOPED_TIMER( stats, "SortThreadProc" );
  
 SortChunk( ... );

 return TRUE;
}

If more than 1 thread at a time is running this function, the static StatTimer object created by the macro is going to be accessed from different threads without any particular precaution and will at best report garbage and most likely crash.


This is something I intend to fix the next iteration of this little library and is quite an interesting problem per se. In the meantime, and hoping that you'll find all that helpful, have fun !!

Attachments:
-m

Sunday, 27 February 2011

A way to avoid virtual function calls in C++

A couple of days ago, a colleague sent this link to a paper called "Executable Bloat - How it happens and how we can fight it" which was an interesting read. The point about the cost of virtual function calls and the fact that you should try to avoid them reminded me of a bit of C++ code I wrote a few years ago that tried to achieve exactly that.

The idea is to transform virtual updates into non virtual updates using templates. You go from 1 virtual update per object to either no virtual call at all or 1 virtual call per class, depending on how much performance you want to trade for flexibility.

In lots of game engines, calling virtual functions once per frame on objects is a common pattern. Such objects are generally handled in a manager that will process the each frame. I have seen this pattern in lots of codebases, no matter what style of objects hierarchies was used (either OOP-like with deep and complex class trees or more Data-Oriented entity/components systems).

A typical, although a bit naive, example of such a pattern could a renderer that would draw 3D objects. In it, you would find a base interface for everything that can be rendered (IRenderable), some concrete classes that implement this interface (Mesh and ParticleSystem) and a manager, the renderer itself, that would store the objects that needs to be drawn. Here is some pseudo-code:

//////////////////////////////////
// Rendering system
//////////////////////////////////

class IRenderable
{
public:
    virtual void Render() = 0;
};

class Mesh : public IRenderable
{
public:
    virtual void Render()
    { /* Do rendery things */ }
};

class ParticleSystem : public IRenderable
{
public:
    virtual void Render()
    { /* Do rendery things */ }
};


class RenderManager
{
private:
    std::vector< IRenderable * >    m_Objects;

public:
    void RenderAll()
    {
        std::vector< IRenderable * > iter = m_Objects.begin();
        std::vector< IRenderable * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();  // 1 virtual call per object
        }
    }
};


//////////////////////////////////
// Main
//////////////////////////////////

RenderManager gRenderManager;

void main()
{
    while(1)
    {
        gRenderManager.RenderAll();
    }
}

With such a design, you get 1 virtual call per object. Depending on the the platform, the content of the list and some other factors, these virtual calls can become quite a significant performance hit because of cache misses.

In order to get rid of the virtual calls, we will declare 1 render manager per type of Renderable, using templates. This transforms the dispatching of the call to Render() from virtual to static. WIN!

Here's the new code:

//////////////////////////////////
// Rendering system
//////////////////////////////////

class Mesh                // No base interface anymore
{
    void Render()        // Not virtual anymore
    { /* Do rendery things */ }
};

class ParticleSystem    // No base interface anymore
{
    void Render()        // Not virtual anymore
    { /* Do rendery things */ }
};

template< class _T >
class RenderManager
{
private:
    std::vector< _T * >       m_Objects;

public:
    void RenderAll()
    {
        std::vector< _T * > iter = m_Objects.begin();
        std::vector< _T * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();   // this call is NOT virtual
        }
    }
};

//////////////////////////////////
// Main
//////////////////////////////////

RenderManager< Mesh > gRenderManager_Mesh;
RenderManager< ParticleSystem > gRenderManager_ParticleSystem;

void main()
{
    while(1)
    {
        gRenderManager_Mesh.RenderAll();
        gRenderManager_ParticleSystem.RenderAll();
    }
}

The change from virtual to static dispatch comes with a small price though. You now have to call RenderAll() on all the different managers, which can get tedious and bug prone. To improve that, we can reintroduce polymorphism, but this time we'll do it at the manager's level. So in effect we transformed 1 virtual call per instance into 1 virtual call per class, which is likely to be a few orders of magnitude smaller.

/////////////////////////////////
// Rendering system
//////////////////////////////////

class Mesh
{
    void Render()
    { /* Do rendery things */ }
};

class ParticleSystem
{
    void Render()
    { /* Do rendery things */ }
};


class IRenderManager    // Base class for all managers
{
public:
    virtual void RenderAll() = 0;
};


template< class _T >
class RenderManager : public IRenderManager        // all template mamangers inherit from the interface
{
private:
    std::vector< _T * >       m_Objects;

public:
    void RenderAll()
    {
        std::vector< _T * > iter = m_Objects.begin();
        std::vector< _T * > iend = m_Objects.end();

        for( ; iter != iend; ++iter )
        {
            *iter->Render();   // this call is NOT virtual
        }
    }
};


//////////////////////////////////
// Main
//////////////////////////////////

RenderManager< Mesh > gRenderManager_Mesh;
RenderManager< ParticleSystem > gRenderManager_ParticleSystem;

std::vector< IRenderManager * > gRenderManagers;        // will store all the managers

void main()
{
    // at init time, add the managers into an array so we can iterate over them
    // ordering is explicit, which is a nice thing (think opaque/transparent objects for instance)
    gRenderManagers.push_back(&gRenderManager_Mesh);
    gRenderManagers.push_back(&gRenderManager_ParticleSystem);

    while(1)
    {
        std::vector< IRenderManager * > iter = gRenderManagers.begin();
        std::vector< IRenderManager * > iend = gRenderManagers.end();

        for( ; iter != iend; ++iter )
        {
            *iter->RenderAll();
        }
    }
}

In a real world codebase, a set of macros would surely be used for declaring and registering a new manager for each type, making the system easier to maintain and extend.

I think the approach described here works quite well for entity/components systems wit Data-Oriented Design (DOD) in mind. In such systems, your managers would also deal with allocating/deallocating the components in a memory friendly manner.

I hope that you will find all this useful in your projects!
See you next time, or maybe at GDC next week!

-m

Sunday, 1 August 2010

Memory Leaks

Yesterday, I spent some time trying to fix any memory leak I could find. This is the kind of things that I like to check regularly as it is a massive pain when you do it after it's not been done in a while. More importantly, leaks are often a good indicator that areas of code require more design and code quality attention so that's always worth keeping an eye on that.

There are several ways to track memory leaks and there is already a lot of tools available and web articles on this subject. I will probably have a look at more advanced solutions in the future but for this session I used a really basic yet powerful method. The CRT library that ships with Visual Studio has some debug functionality to help track leaks down. Basically the allocator can track all the allocations that were made and that were not deallocated before exiting the application. This is done by adding a single line at the very beginning of your entry point function (you can put it anywhere really but you really want that call to happen as early as possible so you can start tracking allocations):

_CrtSetDbgFlag( _CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF );

If you happen to have memory leaks, after closing the app, you should see in the Output window of Visual Studio something that looks like that:

Detected memory leaks!
Dumping objects ->
{2221} normal block at 0x08FE63D0, 164 bytes long.
 Data: <                > D4 88 1A 10 FF FF FF FF FF FF FF FF 00 00 00 00
{2191} normal block at 0x090771C8, 164 bytes long.
 Data: <                > D4 88 1A 10 FF FF FF FF FF FF FF FF 00 00 00 00
{2079} normal block at 0x09077590, 164 bytes long.
 Data: <                > D4 88 1A 10 FF FF FF FF FF FF FF FF 00 00 00 00
{2047} normal block at 0x08FD9918, 164 bytes long.
 Data: <                > D4 88 1A 10 FF FF FF FF FF FF FF FF 00 00 00 00
{469} normal block at 0x08FE5F98, 164 bytes long.
 Data: <                > D4 88 1A 10 FF FF FF FF FF FF FF FF 00 00 00 00
Object dump complete.

This is the list of all the allocated memory blocks that were not deallocated. Notice the number in the brackets at the beginning of every block - that's where things get interesting :) This number is an identifier for this particular memory allocation. The good thing is that there is another function you can use at the very beginning of your app to break into the debugger whe
n that specific allocation happens, so you can work out why there is no deallocation and fix the issue :

_CrtSetBreakAlloc( 2221 ); // Breaks in the debugger when alloc id=2221 is made !

This is not as lean as using some tools like BoundsChecker or more advanced debug functionnality you can embed in your software but it comes out of the box - on windows at least. I find that if you have it turned on all the time and regularly check your log output, then you can fix the leaks as soon as they appear and keep an application that is stable memorywise.