|
Table of Contents
October 2It begins...The launch of the v2.6 update to L3DT has begun, starting with the Standard edition (see release ann.) The Professional and L3DT for Torque editions will follow in a week or so, once I’ve finished writing their much lengthier release announcements, and once I’m slightly more satisfied that I haven’t cocked-up anything at the last minute. Stay tuned... September 18The worst thing about developing L3DT...
...is - without a doubt - tweaking the design/inflate heightfield algorithm. Every time I touch the code for design/inflate, or even think about touching the code, I instantly lose a week of my life. Here’s why: OK = zCalcHF_InflateMosaic(zHF, SwapMap1, Name1, NULL, TileSize, hFormat, zDM, 2); // to 1/32 res zCalcMan_AdvanceCalcStage(); if(DoErosion && OK) OK = zCalcHF_ChannelPass(SwapMap1, zDM, 10, 0, 1, 20, 0.005f, 0.2f, true); zCalcMan_AdvanceCalcStage(); if(DoErosion && OK) OK = zCalcHF_ThermalPass(SwapMap1, zDM, 5, 0, 3, 0.05f, ThermalMaxGrad); zCalcMan_AdvanceCalcStage(); OK = zCalcHF_InflateMosaic(SwapMap1, SwapMap2, Name2, NULL, TileSize, hFormat, zDM, 2); // to 1/8 res zCalcMan_AdvanceCalcStage(); if(DoErosion && OK) OK = zCalcHF_ChannelPass(SwapMap2, zDM, 10, 0, 1, 10, 0.01f, 0.2f, true); zCalcMan_AdvanceCalcStage(); if(DoErosion && OK) OK = zCalcHF_ThermalPass(SwapMap2, zDM, 5, 0, 3, 0.05f, ThermalMaxGrad); zCalcMan_AdvanceCalcStage(); if(OK) OK = zCalcHF_PeakPass(SwapMap2, zDM); zCalcMan_AdvanceCalcStage(); if(OK) OK = zCalcHF_InflateMosaic(SwapMap2, SwapMap1, Name1, NULL, TileSize, hFormat, zDM, 1); // to 1/4 res zCalcMan_AdvanceCalcStage(); if(DoErosion && OK) OK = zCalcHF_ChannelPass(SwapMap1, zDM, 5, -1, 1, 10, 0.02f, 0.2f, true); zCalcMan_AdvanceCalcStage(); if(DoErosion && OK) OK = zCalcHF_ThermalPass(SwapMap1, zDM, 1, -1, 10, 0.05f, ThermalMaxGrad); zCalcMan_AdvanceCalcStage(); if(OK) OK = zCalcHF_InflateMosaic(SwapMap1, SwapMap2, Name2, NULL, TileSize, hFormat, zDM, 1, 0.75); // to 1/2 res zCalcMan_AdvanceCalcStage(); if(OK) OK = zCalcHF_VolcanoPass(SwapMap2, zDM); if(OK) OK = zCalcHF_MountainPass(SwapMap2, zDM); zCalcMan_AdvanceCalcStage(); if(DoErosion && OK) OK = zCalcHF_ChannelPass(SwapMap2, zDM, 5, -1, 1, 10, 0.02f, 0.2f, true); zCalcMan_AdvanceCalcStage(); if(DoErosion && OK) OK = zCalcHF_ThermalPass(SwapMap2, zDM, 1, -2, 10, 0.02f, ThermalMaxGrad); zCalcMan_AdvanceCalcStage(); if(OK) OK = zCalcHF_InflateMosaic(SwapMap2, zHF, HFname, "HF", TileSize, hFormat, zDM, 1, 0.5); // to final res zCalcMan_AdvanceCalcStage(); if(OK) OK = zCalcHF_PlateauPass(zHF, zDM); zCalcMan_AdvanceCalcStage(); if(OK) OK = zCalcHF_TerracePass(zHF, zDM); zCalcMan_AdvanceCalcStage(); if(DoErosion && OK) OK = zCalcHF_ChannelPass(zHF, zDM, 1, -2, 1, 10, 0.1f, 0.2f, false); zCalcMan_AdvanceCalcStage(); if(OK) OK = zCalcHF_FileOverlayPass(zHF, zDM); zCalcMan_AdvanceCalcStage(); The above snippet of code is a small segment of the DesignInflate128M algorithm. It shows the chain of alternating calls to fractal inflation, channelling erosion, thermal erosion, peak overlays, mountain overlays, volcano overlays, terrace/cliff overlays, plateau overlays, and file overlays. Each version of the design/inflate algorithm (i.e. 16x, 32x, 64x and 128x) have similar, but slightly different, chains of subroutine calls like this one. For each one, I have to painstakingly tune and test the parameters to the subroutines, particularly those for the channelling and thermal erosion, and fractal inflation function calls. It is a bear of a job, and normally something best avoided. Last week - against all reason, logic and experience - I decided that design/inflate needed a little update. Specifically, I thought that the fractal inflation was introducing too much noisy randomness in the 128x and 256x algorithms. A small tweak, I thought. A final spit-and-polish of the parameters before releasing L3DT version 2.6, I thought. What could possibly go wrong? I thought. On closer inspection, I found that the fractal inflation subroutine (represented above by ‘zCalcHF_InflateMosaic’) had a dubious noise amplitude calculation that resulted in disproportionately large noise levels for long inflation chains (i.e. 128x, 256x.) Conversely, it produced too little noise for short inflation chains (i.e. 8x, 16x). The solution I took was to change the noise amplitude calculation from a crazy bounded inverse exponential function of the horizontal scale to a simple linear function of the horizontal scale. Easy. After a few short days of parameter tuning, I had all of the design/inflate algorithms working nicely - maybe even better than ever. They all looked great. ...and then I changed the horizontal scale. Disaster. Whilst all the design/inflate algorithms now worked perfectly for 10m/vertex, they looked horrible with 1m/vertex, and worse at 100m/vertex. That’s not good. So, I’m back to the drawing board, with little to show for the last four days or so. Perhaps that crazy bounded inverse exponential function wasn’t so bad after all... Cheerio, Aaron. Bootnotes:
August 29Light map calculation benchmarksHi All, Yesterday I posted some benchmark results comparing the speed of texture generation in with the current release (v2.5c) against the forthcoming release (v2.6). Today, I’m going to compare the speed of lightmap generation. Benchmark conditionsThe conditions for this test were the same as those used yesterday, except that instead of generating the texture map, I was generating the light map.
Please note that these benchmarks are for the lighting calculation only, and do not include the shadow-casting calculation. One calculation at a time.
Part 1: L3DT v2.5c mosaic vs. non-mosaicThe graph below shows the performance of the previous release of L3DT Professional, version 2.5c, for both mosaic and non-mosaic light map generation. As appalling as the benchmarks were yesterday for texture generation with L3DT 2.5c, these results for light-mapping are worse. As before, you can see that the mosaic map calculation in v2.5c is much slower than the non-mosaic calculation. What makes these results worse than yesterday’s was that the mosaic light mapping algorithm in v2.5c threw errors and aborted when run with three or four active cores, and thus the three and four core mosaic results are missing. These errors in the mosaic light map system were related to the mosaic map cache manager, which as I described yesterday, was very poor at handling concurrent requests from multiple threads. This has been fixed in v2.6. Part 2: L3DT v2.6 mosaic vs. non-mosaicNow fore the results from the forthcoming release, version 2.6. The graph below shows the performance v2.6 for both mosaic and non-mosaic light map generation: Here you can see that the light map performance scale up linearly with the number of cores (as it should), and that the mosaic map and non-mosaic light map calculations are now about the same speed (as they should be). So, whilst version 2.5c couldn’t handle multi-core generation of mosaic light maps, version 2.6 has no such problems. Part 3: L3DT v2.5c vs. L3DT 2.6I apologise in advance if this is a little gratuitous, but as I did yesterday, I’m now going to directly compare the results for the old release (v2.5c) with the forthcoming release (v2.6). First up is the comparison of non-mosaic light map generation: There wasn’t much wrong with the non-mosaic calculations in v2.5c, so the improvements here are fairly modest. These results show that version 2.6 (light green) is about 10% faster than v2.5c (dark green) for non-mosaic lgiht map generation. This improvement is partly due to a compiler upgrade (now using MSVC 2008), and partly due to some minor optimisations in the light mapping algorithm. As with yesterday, the big news remains the improvements in mosaic light map generation: The results show that for single-core mosaic light maps, version 2.6 is ~700% faster than v2.5c, and for dual-core, it’s 1500% faster. No direct comparison can be made for triple and quad-core calculations, because release 2.5c simply didn’t work with more than two cores. However, if you take the fastest speed achieved with version 2.6 (which was for quad core) and compare it with the fastest working speed from v2.5c (which was for single core), you get a speed increase of two thousand four hundred percent. ConclusionIf you like to make big maps, upgrading to L3DT v2.6 will save you a lot of time. It your computer happens to be multi-core, the time savings could be tremendous*. The release date for L3DT 2.6 is mid-September. * Your mileage may vary. The actual time savings depends on how you’re using L3DT, your choice of settings, and how your computer is configured. August 28Texture calculation benchmarksHi All, With L3DT release 2.6 only a few weeks away, I thought this might be a good time to run some head-to-head comparisons of the new release (v2.6) against the previous release (v2.5c). In particular, I’m going to show you just how far we have come in optimising the multi-core, mosaic-mapped calculations in L3DT Professional Edition. Benchmark conditionsBefore we get to the numbers, I should describe how the tests were conducted. Basically, I generated a complete map (heightmap, lightmap, etc.), and then I ran the texture generation algorithm repeatedly with 1, 2, 3, and 4 cores enabled, using mosaic and non-mosaic texture generation. The results will be presented in terms of the texture pixel throughput, measured in pixels per millisecond. Larger values are better, and imply faster texture generation. All of the benchmark tests were conducted on the same map, with the following settings:
For the mosaic tests, the attributes, normal, light map and texture map were all mosaics, with a tile size of 512×512 pixels. For the non-mosaic tests, all these maps were in-RAM, with mosaic mapping disabled. The benchmarking system was setup as follows:
Part 1: L3DT v2.5c mosaic vs. non-mosaicThe graph below shows the performance of the previous release of L3DT Professional, version 2.5c, for both mosaic and non-mosaic texture generation. These results show two very bad things about the mosaic system in v2.5c:
If you described these results for the mosaic map system “appalling”, I would be inclined to agree. Fortunately, I’m very pleased to say that these two problems have now been solved in version 2.6, as shown by the next section. Part 2: L3DT v2.6 mosaic vs. non-mosaicThe graph below shows the performance of the forthcoming release of L3DT Professional, version 2.6, for both mosaic and non-mosaic texture generation: Here you can see that the mosaic map results (blue bars) scale up with the number of cores. In fact, the mosaic map calculation is now faster than the non-mosaic calculation (red bars), by about 10% for any number of cores (up to four, anyway). This huge improvement in the mosaic system was brought about by a smarter mosaic cache manager with far fewer thread locks, and by optimising the texture calculation to bypass some of the overheads in the mosaic system. Part 3: L3DT v2.5c vs. L3DT 2.6Now I’m going to directly compare the results for the old release (v2.5c) with the forthcoming release (v2.6). First up is the comparison of non-mosaic texture generation: Across the board, v2.6 (light green) is about 25% faster than v2.5c (dark green) for non-mosaic texture generation. Not bad. However, the big news is the improvements in mosaic map texture generation: Version 2.6 (light green) is monumentally faster than v2.5c (dark green) for mosaic texture generation, ranging from an impressive 275% improvement for single-core calculations, to an absurd 1100% improvement for quad core calculations (yes, one thousand and one hundred percent faster). ConclusionThe mosaic map system in L3DT 2.6 is a colossal improvement over that of v2.5c, providing a speed-up of better than a thousand percent for texture generation on quad-core processors. Speaks for itself, really. The release date for L3DT 2.6 is mid-September. Disclaimer: These benchmark results apply only to the texture generation algorithm. You should not expect the heightfield, water-mapping or attributes map algorithms to improve by comparable amounts. Similar improvements may be occur with the normal map and light map calculations, as these algorithms are structured identically to the texture algorithm. However, this is not guaranteed, as the benchmark tests have not yet been conducted for those algorithms.
|