Cuda?
Posted: Wed Jan 12, 2011 1:18 pm
Hi Aaron,
I was reading the other thread that mentioned this but though it might be an idea to make my own so it could be more focused
I was wondering if there was any u-turn on Cuda implementation?
xNormal implemented the Optix renderer for Normal and Ambient Occlusion (light) maps with some stunning results and a massive increase is speed (even when compared to a i7 920)
Another thing that might be worth a try is for faster AA on all maps is to render the maps at double the selected resolution with 0xAA and then resize by half (which gives you 4xAA).
This is a much used technique in game development and is often much faster then just rendering out 4x AA at native res.
Thanks for listening
I was reading the other thread that mentioned this but though it might be an idea to make my own so it could be more focused
aaron wrote:Hi Monks,
Sorry, nothing to report yet, and I probably won't have anything for quite a while. In order of optimsations, I'll do yet more multithreading first, then SSE, then CUDA (maybe). The more reading I do on the subject, the more I've realised that CUDA is only applicable to a very narrow band of problems. Somewhat peversely, of all the calculations in L3DT, only the relatively 'fast' light-map generation algorithm might fit into the CUDA niche of massively parallel repetative non-forking operations on long vectors of floating point data. The 'slow' algorithms (erosion, water table flooding, AM generation, TX anti-aliasing) are not readily suitable to CUDA, although I should be able to make some headway here with more multithreading and/or SSE.
Cheerio,
Aaron.
I was wondering if there was any u-turn on Cuda implementation?
xNormal implemented the Optix renderer for Normal and Ambient Occlusion (light) maps with some stunning results and a massive increase is speed (even when compared to a i7 920)
Another thing that might be worth a try is for faster AA on all maps is to render the maps at double the selected resolution with 0xAA and then resize by half (which gives you 4xAA).
This is a much used technique in game development and is often much faster then just rendering out 4x AA at native res.
Thanks for listening