An article where I showcase the clear frame rate variation between having a solid optimization system and not having one in a Source engine map, with plenty of numbers and a complete comparative study
Comparative fps study in Source Engine Optimization System
I have been writing papers and articles about optimization for the past 4 years, and have been preaching about the utmost importance of a good optimization system in a Source map in all of my writings.
It is about time I dedicate a whole article where I showcase the clear difference between having a solid optimization system and not having one. Since I am also a man of science, the article will have numbers, a lot of numbers, for a complete comparative study that shows the frame rate variation with and without the optimization system in place.
As with every scientific study and experiment, we need a guinea pig; luckily, my map cs_calm has volunteered for this job.
Every solid study needs a systematic approach and a standard operating procedure to make sure the findings are as accurate and impartial as possible.
The methodology that I will use is fairly simple and straightforward: I will systematically record the fps in key locations in the map, then I will turn off the optimization system, one component at a time, and test the new frame rate at the same locations (localized fps). I will also systematically roam the maps and record time demos to get the average fps of the entire map.
The fps study will be done according to my devised procedure that is all nicely and clearly laid out in my paper Optimization Testing in Source Engine. For the optimization system, some components can be turned off (such as hints) while for others, I will need to edit and compile a whole new map to test the effects (such as the skybox). I will need to fully compile 6 new versions of the map which is time and effort consuming but it will be done…for science.
As I mentioned earlier, the map to be used in this study is my own CSGO map cs_calm. It is fully optimized with regard to layout, skybox, brushwork, func_detail, nodraw, props fading, lightmaps scale, hints, and areaportals. Only occluders are not present in the map (not needed).
We will start first by recording the fps in 2 key locations of the map: CT spawn and T spawn. These are the first locations that the player will have contact with and they are fairly open and highly detailed which makes them perfect to compare the effects of the optimization system.
The test system is an i7 with 6GB of RAM and a 2GB NVidia card all running on Windows 10 64-bit at 1920x1080 resolution.
For CSGO settings, everything is on high with 4X MSAA and 8X Anisotropic filtering and multicore rendering enabled.
The compile process of each map version is done with full vbsp, vvis, vrad and final parameters for vrad (-final -textureshadows -hdr -StaticPropLighting –StaticPropPolys).
The fps for these 2 locations, as can be seen in the screenshot below, is 233 for CT spawn (left) and 274 for T spawn (right). These figures will be subsequently used as benchmarks to compare the fps from the map variations against them.
In this first case, I deleted all the optimized skybox brushes and replaced them with one giant skybox cube that envelops the whole map. This is a common mistake that many designers “commit”. All other optimization components in the map remain unchanged.
Results and fps variation are summed up in the table below. I am always benchmarking against the fps of the original cs_calm (showcased in the previous screenshot).
You can notice that CT spawn dropped 23 fps while T spawn incurred a 27 fps drop. This is a significant drop but one to be expected because of all the useless newly created visleaves and the new open sight lines on the map’s edges.
In this case, I replaced all nodraw surfaces in the map with a concrete texture. All the other optimization components are untouched; remember, we are only changing one component at a time to test the corresponding fps change and variation.
The drop here, 23 and 26 fps respectively for CT and T spawns, is somehow similar to the case of the skybox cube. This is all because of extra overhead on the engine originating from all the faces that were textured with a concrete texture, a real texture (see what I did here?)
Props fade distance
In this case, I removed the fade distance from all the props in the map (static, physics, and dynamic).
The drop here will be relative to the number of props in the scene and the openness of the map. The more props you have and the more open the map is, the more severe the fps drop will be when you remove the fade distance. You can see here that the drop, 11 and 17 fps, was slightly inferior to the 2 cases before. This is due to the layout of the map itself and to the other optimization components, mainly hints, which are keeping the fps in check.
In this case, I disabled the hints visgroup in Hammer and compiled a new version of the map.
The drop here is substantial with 21 fps in both locations. When you disable hints, more visleaves will be visible in the PVS and the engine will incur additional rendering overhead which will directly affect the fps.
As with the case before, I disabled the areaportals visgroup in Hammer and compiled a new version of the map.
Similar to the hints case, the fps drop is important, 18 fps, and is also due to over-rendering between visleaves and the elimination of the view frustum culling effect of areaportals once they were removed.
This case is supposed to be like the others where I select all the func_detail in the map and revert them to regular world brushes, compile a new map version, and test the fps accordingly. However, cs_calm is highly detailed and relies heavily on func_detail with intricate shapes; I knew beforehand that compiling this new version on full VIS and final RAD with all these details switched back to regular world brushes was not going to happen…at least not easily or quickly.
I decided to discard this case and not include it in the above study with other optimization components. Nevertheless, for the sole sake of science and testing, I decided to give this compile a try…out of curiosity.
Just as I suspected, after 3 long hours, the portalflow in vvis was still stuck at 60-70% and it was not moving, nor it had any obvious intentions of moving; it was there to stay (probably overnight). I had to put an end to its miserable life.
I still won’t include this version in the study but I got more curious to see how this “world brush details” edition would benchmark against the original map. For this reason, I decided to perform a separate, “independent” small test on this version using fast VIS and fast RAD; mind you the results won’t be as accurate as a full VIS since the advanced vis algorithm won’t run, therefore, full visibility matrix won’t be calculated and the lighting will be flat and basic without dynamic shadows which could also affect fps.
To keep things fair, accurate, and in check, I will compile both the “world brush details” map and the original one in fast compile parameters to keep a level playing field and similar testing conditions and parameters.
Even on a fast compile, the version with details as regular world brushes dropped by 13 and 10 fps respectively for CT and T spawns. The drop is not as big as the previous cases, in part due to the inaccurate fast VIS compile, and because the other optimization components are still in place (skybox, hints, areaportals, etc…), keeping the fps under control. But nevertheless, 10 fps drop could mean the difference between a playable and an unplayable map when the gaming system is in low to mid-range specs.
Combination of components
We have seen how much the frame rate would vary when we change one component of the optimization system, and it was a significant variation.
What happens then when we change several or all components at once? I bet this is the part you all have been waiting for.
In this case, I disabled hints and areaportals visgroups, removed all props fade distance, replaced nodraw with a concrete texture, and removed optimized skybox brushwork, substituting it with a big hollow skybox cube. The map was compiled on full final compile as with the other cases included in the study.
Without further ado, here are the results as well as a screenshot showcasing the fps in CT and T spawns.
On the spot, you can observe a huge drop in both locations: 98 fps in CT spawn versus 84 fps in T spawn. Remember that the figures are taken with me standing alone in the map; if you factor in bots, human players, and server-client ping and lag, then you can anticipate an additional 100 fps drop. The map will be unplayable in certain locations if the fps drops below 60.
This is, ladies and gentlemen, what happens when you don’t
eat your vegetables, optimize your map properly. We are talking here about a near 100 fps drop in certain map locations and this is a surefire way to kill the map; no one will play it, or more precisely, replay it once they test the first time.
Since we all like numbers (at least I do), let us observe some more figures related to compile times and average fps of all the above map versions (except the one of the func_detail compiled on fast settings that I decided not to include in the study).
The numbers were recorded using the time demo technique (refer to my fps testing paper for methodology). You can see that the figures are more average and smoother than the sharp variations in the localized fps in the previous cases. This is mostly due to large parts of the map taking place indoors with twisting corridors that are naturally suited for the Source engine and help it raise the fps intrinsically.
When we modify one component at a time, the difference is restricted between 5 and 10 fps but when we combine all the optimization components, the drop is huge (almost 41 fps) which speaks a lot considering this is an average fps not a localized number.
As a bonus feature, I will throw in the compile times, listed below, that can give an additional insight on each map version.
You can observe that a well optimized map should have a very short vvis time (unless it’s a really large map with a very complex optimization system); the non-optimized map version has its VIS numbers on the high side.
You can see that the biggest VIS times are for the skybox and the combination versions of the map since they involve adding substantial numbers of additional useless visleaves. The nodraw and props versions have no impact on visleaves numbers, while removing hints and areaportals reduces the number of total visleaves (at the cost of decreasing fps). As I mentioned in a previous article, do not confuse VIS time with in-game performance.
On the fast compile test, you can notice that the version where I switched the func_detail into regular world brushes has a 10s VIS time compared to virtually nothing on the optimized map. Remember that on full final compile, the map was stopped as it would have taken a day to finish or even crash before outputting a final visibility matrix. Even on fast compile where VIS only does preliminary visibility checks, the map took 10s and this is due to the huge number of additional visleaves that were created by the regular world brushes that were func_detail before.
This article and study were a good exercise to showcase, in numbers and hard facts, the utmost importance of a good optimization system in a Source engine map.
Having tirelessly preached about the significance of optimization over the years, I believe the facts and figures presented here leave no doubt about this issue. A good optimization setup is not a choice, it’s a must if you wish to have a nicely playable map.
I sure hope this study serves as a motivational boost and a wakeup call for any designer, beginner or expert, who is still on the fence on the whole importance of a good optimization system in Source.
September 27, 2015