Rendering and HPC Benchmark Session Using Our Best Servers
by Johan De Gelas on September 30, 2011 12:00 AM ESTTesting the Opteron HPC Remedy
The results of memory node interleaving are pretty spectacular, at least in terms of improving Opteron performance.
Once we disable NUMA, our Opteron server scales properly. Performance is multiplied by 3 when we run the benchmark with 48 threads. So memory interleaving does the trick, but since memory interleaving increases the traffic between the CPU nodes, we decided to test with HT assist (a 1MB snoop filter) on and off.
Notice how this benchmark relies on the CPU interconnects: when we disable HT assist but leave interleaving on, we lose more than 25% performance. HT assist avoids many unnessary broadcasts on the HT interconnects. What is more, we did test the Xeon E7 with memory node interleaving (4-way) but this did not improve or decrease performance in any substantial way.
There's even more good news for the Opteron: the score on Cinebench R11.5 rendering improved from 25 (NUMA) to 26.3. (memory node interleaving). It's hardly spectacular, but that's still a nice and free of charge 5% performance boost, assuming you're running workloads that will benefit.
52 Comments
View All Comments
derrickg - Friday, September 30, 2011 - link
Would love to see them benchmarked using such a powerful machine.JohanAnandtech - Friday, September 30, 2011 - link
Suggestions how to get this done?derrickg - Friday, September 30, 2011 - link
simple benchmarking: http://www.linuxhaxor.net/?p=1346I am sure there are much more advanced ways of taking benchmarks on chess engines, but I have long since dropped out of those circles. Chess engines usually scale very well from 1P and up.
JPQY - Saturday, October 1, 2011 - link
Hi Johan,Here you have my link how people can test with Chess calculatings in a very simple way!
http://www.xtremesystems.org/forums/showthread.php...
If you are interested you can always contact me.
Kind regards,
Jean-Paul.
JohanAnandtech - Monday, October 3, 2011 - link
Thanks Jean-Paul, Derrick, I will check your suggestions. Great to see the community at work :-).fredisdead - Monday, April 23, 2012 - link
http://www.theinquirer.net/inquirer/review/2141735...dear god, at last the truth. Interlagos is 30% faster
hey anand, whats up with YOUR testing.
fredisdead - Monday, April 23, 2012 - link
everybody, the opteron is 30% fasterhttp://www.theinquirer.net/inquirer/review/2141735...
follow thew intel ad bucks ... lol
anglesmith - Friday, September 30, 2011 - link
i was in a similar situation on a 48 core opteron machine.without numa my app was twice slower than a 4 core i7 920. then did a test with same number of threads but with 2 sockets (24 cores), the app became faster than with 48 cores :~
then found the issue is all with numa which is not a big issue if you are using a 2 socket machine.
once i coded the app to be numa aware the app is 6 times faster.
i know there are few apps that are both numa aware and scale to 50 or so cores but ...
tynopik - Friday, September 30, 2011 - link
benhcmarklike it Phenom
JoeKan - Friday, September 30, 2011 - link
I'd llove to see single core workstations used as baseline comparisons. In using a server to render, I'd be wondering which would be more cost effective to render animations. Maybe use an animation sequence as a render performance test.