The motivation for developing this library was to have a set of simple tools I can use to instrument my code and analyse its performance. Also, such a system can also be used for collecting gameplay statistics so I kept that in mind even if this is not the primary goal I was aiming for. And when I say gameplay statistics, I am thinking at something designers could use to tune the game, as opposed to statistics in the sense of "leaderboards" for which this system might be overkill and somewhat ill-suited.
What follows is a description of how the system works. If you prefer, you can jump straight to the source code, available at the bottom of this post. It's all contained in a single .h file for convenience, uses the STL although it would be trivial to replace it with other containers, and works on Win32 so far. The only platform specific bit is the StatGetTime() function you would need to reimplement with whatever real time clock API your platform provides.
Still here? Good :) So, the system can record 3 types of stats:
- Accumulators: this is the simplest type, code can add/subtract any arbitrary value to it whenever it wants and the systems simply keeps track of the global count. It's not the most useful one for performance analysis but it can be pretty handy for gameplay stats where you way want to bump a stat every time a specific event happens in your game. The only thing you can retrieve from it is the Global Count.
- Counters: Same as accumulators, except that it gets reset once per frame. This is going to be helpful for tracking how many times something happens per frame, like how many objects are rendered, how many animation bones got updated or how much data was sent over the network. Also, each frame, the total is fed into a history buffer that will allow us to get the recent average (over the last n frames), which can be more interesting than the global average as thing might largely evolve over time. In a combat game for instance, performance is really different when nothing is happening than when 10 enemies are pathing toward you... and you probably want the average CPU usage to mirror that rather than getting the average usage since the game started. On counters you can retrieve:
- This frame's count
- Recent average count
- Global average count
- Global minimum count
- Global maximum count
- Timers: They will record the amount of time elapsed from the moment they get started to when they get stopped. This can happen several time per frame and we keep track of how many time they go through a start/stop process. Every frame, both the count and cumulated time are reset. Timers are perfect for monitoring code execution as they will be able to tell how many time an instrumented function was called per frame, and how much time was spent in it. They also have their history buffers so you get recent both and global averages. On timers you can retrieve:
- This frame's time
- Recent average time
- Global average time
- Global minimum time
- Global maximum time
- This frame's count
- Recent average count
- Global average count
- Global minimum count
- Global maximum count
Here is a UML diagram of the different classes of this system, with a quick summary of their role:
(Click on the diagram to view a full-size version) |
- StatBase: Base class for Accumulators, Counters and Timers. This allows to batch manage them in a StatManager
- StatHistory: template class that will store the last N samples and compute the min/max/global and recent average
- StatAccumulator: The accumulator described above
- StatCounter: The counter described above. Has a StatHistory for the buffering its count
- StatTimer: The timer described above. Has 2 StatHistory objects, one for the count, one for the elapsed time
- StatTimerScope: Wrapper around a StatTimer that will start its associated timer at construction and stop it at destruction. Useful for timing functions.
- StatManager: Keeps track of a bunch of stats and updates them every frame. Also responsible of creating/destroying the stats. Note that a StatManager is not a singleton so you can have multiple ones. Typically you could have one for performance analysis and one for gameplay stats.
- StatProcessor: Functor that can be applied on a collection of stats to process them. Like rendering them on screen for instance.
- StatTextDump: A StatProcessor that will list all the stats and dump them on the standard output.
For convenience, you can use the following macros to instrument your code:
- STAT_COUNT( manager, stat_name, value )
- STAT_ACCUMULATE( manager, stat_name, value )
- STAT_START( manager, stat_name )
- STAT_STOP( manager, stat_name )
- STAT_SCOPED_TIMER( manager, stat_name )
A typical usage would be something like:
void main() { ... StatManager stats.Init(); { STAT_SCOPED_TIMER( stats, "Init Data" ); ... } for( ... ) { STAT_ACCUMULATE( stats, "Loops count", 1 ); RadixSort::Sort( ... ); stats.Update(); } StatTextDump statDump; statDump.PrintHeader(); statDump.ProcessStats( stats.GetStats() ); stats.Term(); }
And this would produce the following output:
| Cpt | RctAvgCpt | m Cpt | M Cpt | GlblAvgCpt | RctAvgT | m T | M T | GlblAvgT |
---------------------+---------+------------+---------+---------+------------+---------------+---------------+---------------+---------------|
CalcHistograms | - | 4.000 | 4 | 4 | 4.000 | 40.215 ms | 39.129 ms | 62.043 ms | 42.222 ms |
CalcOffsets | - | 4.000 | 4 | 4 | 4.000 | 0.030 ms | 0.030 ms | 0.074 ms | 0.031 ms |
Init Data | - | 0.000 | 0 | 1 | 0.020 | 0.000 ms | 0.000 ms | 65.029 ms | 1.301 ms |
Loops count | 50 | - | - | - | - | - ms | - ms | - ms | - ms |
Sort | - | 1.000 | 1 | 1 | 1.000 | 118.502 ms | 107.572 ms | 281.035 ms | 138.777 ms |
SplitInput | - | 1.000 | 1 | 1 | 1.000 | 0.002 ms | 0.002 ms | 0.003 ms | 0.002 ms |
It is important to note that the current implementation is not thread-safe and would probably break in various ways. Imagine that you are monitoring a threadproc like this one:
static DWORD WINAPI SortChunkThreadProc( LPVOID _pParams ) { STAT_SCOPED_TIMER( stats, "SortThreadProc" ); SortChunk( ... ); return TRUE; }
If more than 1 thread at a time is running this function, the static StatTimer object created by the macro is going to be accessed from different threads without any particular precaution and will at best report garbage and most likely crash.
This is something I intend to fix the next iteration of this little library and is quite an interesting problem per se. In the meantime, and hoping that you'll find all that helpful, have fun !!
Attachments:
-m
No comments:
Post a Comment