Rendering and HPC Benchmark Session Using Our Best Servers
by Johan De Gelas on September 30, 2011 12:00 AM ESTCinebench R11.5
Cinebench, based on MAXON's software CINEMA 4D, is probably one of the most popular benchmarks around, and it is pretty easy to perform this benchmark on your own home machine. However, it gets a little bit more complicated when you try to run it on an 80 thread server: the benchmark only supports 64 threads.
First we tested single threaded performance, to evaluate the performance of each core.
A Core i7-970, which is based on the same "Westmere" architecture gets about 1.2 at 3.2GHz, so there is little surprise that a slightly lower clocked Xeon 5670 is able to reach a 1.15 score. It is interesting to note however that the Westmere core inside the massive Westmere-EX gets a better score than expected. Considering that Cinebench scales almost perfectly with clockspeed, you would expect a score of about 0.9. The E7 can boost clockspeed by 17% from 2.4 to 2.8GHz, while the previously mentioned i7-970 gets only an 8% boost at most (from 3.2 to 3.46GHz). And of course, the massive L3-cache may help too.
The Opteron at 2.2GHz performs like its Phenom II desktop counterparts. A 3.2GHz Phenom II gets a score of about 0.92, so we are not surprised with the 0.66 for our 2.2GHz core.
When we started benchmarking Cinebench on our Xeon E7 platform, we ran into trouble. Cinebench only supports 64 threads at the most and recognized only 32 of our 40 available cores and 80 threads. The results were pretty bad. To get a decent result out of the Xeon E7, we had to disable Hyper-Threading and we forced Cinebench to start up 40 threads. We included a Core i7-970 (Hyper-Threading on) to give you an idea of how a powerful workstation/desktop compares to these servers. This kind of software is run a lot on fast workstations after all.
Even cheap servers will outperform a typical single socket workstation by almost a factor of two. The quad socket machines can offer up to three or four times as much performance. For those of you who can't get enough: you can find some dual Opteron numbers here. The dual Opteron 6174 scores about 15, and a dual Opteron 2435 2.6 "Istanbul" gets about 9.
Cinebench scales very easily as can be noticed from looking at the 32 core and 40 core results of the Xeon E7-4870. Increase the core count by 25% and you get a 22.4% performance increase. The Opteron scales slightly worse. Compare the 48-core result with the 32 core one: a 50% increase in core counts gets you "only" a 37% increase in performance.
Below you can see the rendering performance of two top machines rendering with different numbers of cores.
You need about 48 2.2GHz Opteron cores to match 32 Xeon cores. The good news for AMD is that even these 8-core Westmere-EX CPUs are almost twice as expensive. That means that quad AMD Opteron 61xx systems are a viable choice for rendering, at least in CINEMA 4D (assuming it has the same 64-thread limitation as Cinebench). AMD has carved out a niche here, which is one reason why there will be cheaper 4 socket Romley EP systems in the near future.
52 Comments
View All Comments
derrickg - Friday, September 30, 2011 - link
Would love to see them benchmarked using such a powerful machine.JohanAnandtech - Friday, September 30, 2011 - link
Suggestions how to get this done?derrickg - Friday, September 30, 2011 - link
simple benchmarking: http://www.linuxhaxor.net/?p=1346I am sure there are much more advanced ways of taking benchmarks on chess engines, but I have long since dropped out of those circles. Chess engines usually scale very well from 1P and up.
JPQY - Saturday, October 1, 2011 - link
Hi Johan,Here you have my link how people can test with Chess calculatings in a very simple way!
http://www.xtremesystems.org/forums/showthread.php...
If you are interested you can always contact me.
Kind regards,
Jean-Paul.
JohanAnandtech - Monday, October 3, 2011 - link
Thanks Jean-Paul, Derrick, I will check your suggestions. Great to see the community at work :-).fredisdead - Monday, April 23, 2012 - link
http://www.theinquirer.net/inquirer/review/2141735...dear god, at last the truth. Interlagos is 30% faster
hey anand, whats up with YOUR testing.
fredisdead - Monday, April 23, 2012 - link
everybody, the opteron is 30% fasterhttp://www.theinquirer.net/inquirer/review/2141735...
follow thew intel ad bucks ... lol
anglesmith - Friday, September 30, 2011 - link
i was in a similar situation on a 48 core opteron machine.without numa my app was twice slower than a 4 core i7 920. then did a test with same number of threads but with 2 sockets (24 cores), the app became faster than with 48 cores :~
then found the issue is all with numa which is not a big issue if you are using a 2 socket machine.
once i coded the app to be numa aware the app is 6 times faster.
i know there are few apps that are both numa aware and scale to 50 or so cores but ...
tynopik - Friday, September 30, 2011 - link
benhcmarklike it Phenom
JoeKan - Friday, September 30, 2011 - link
I'd llove to see single core workstations used as baseline comparisons. In using a server to render, I'd be wondering which would be more cost effective to render animations. Maybe use an animation sequence as a render performance test.