Quad Core Intel Xeon 53xx Clovertown
by Johan De Gelas on December 27, 2006 5:00 AM EST- Posted in
- IT Computing
Render Servers
To get a better idea on how the different server platforms compare, we did some rendering too. Most of our tests (MySQL, DB2, and SPECjbb2005) are very integer intensive, whereas render tests are floating point intensive. We start with a simple Cinebench 9.5 benchmark (on Windows 2003 32 bit), which is based on Maxon's Cinema 4D rendering engine.
Four 2.4GHz Opteron cores are a bit slower than four 2.33GHz Xeons, but when we look at the eight core scores the Opteron is a bit faster. Again, it seems that the Opteron system scales better.
Why do we analyze this in so much detail? Cinebench, like most renders, couldn't care less about the memory subsystem. We tested our Clovertown system with two or four memory channels and the results were exactly the same. Therefore, we are pretty sure the slightly worse scaling of the Xeon E5345 is not a result of limited bandwidth or higher latency. There must be something else that limits scalability, and that something else is most likely cache coherency.
Cinebench is popular because it is an easy benchmark, but 3dsmax is a very popular application. We tested with 3dsmax version 9, which has been improved to work better with multi-core systems. We used the "architecture" scene, which has been our favorite benchmarking scene for years. All tests were done with 3dsmax's default scanline renderer, SSE enabled and we rendered at HD 720p resolution. We measure the time it takes to render frames 20 to 22.
This cannot be a coincidence anymore: a single Xeon E5345 leaves the dual Opteron 880 far behind, but a dual Xeon E5345 trails the quad Opteron. It is not only the application that matters; the dataset has an impact too. Take a look at the table below where rendered at 720p and 480p resolution.
As you can see, the resolution at which you normally render determines how much you benefit from eight cores. Using an octal core machine to render relatively low resolution movies is like driving a potent 8 cylinder engine in a crowded city: all the horsepower goes to waste as you accelerate for a short period and then hit the brakes when approaching a red light. The same is true for rendering: unless you are rendering a complex scene at high resolution, the multi-core engine can never show its full potential. Thanks to better scaling, the quad Opteron platform has still a small advantage.
However, when it comes to price/performance, it is not the quad core Xeon or the Opteron that wins, but most likely the Xeon 5160. It is more flexible as it will outperform the quad core Xeon in any scene that is not as complex as architecture and resolutions that are lower than 720p. Only if your scenes use radiosity lighting can we see a clear advantage for using the quad core Xeon. We noticed that the Xeon was up to 40% faster in such scenes.
To get a better idea on how the different server platforms compare, we did some rendering too. Most of our tests (MySQL, DB2, and SPECjbb2005) are very integer intensive, whereas render tests are floating point intensive. We start with a simple Cinebench 9.5 benchmark (on Windows 2003 32 bit), which is based on Maxon's Cinema 4D rendering engine.
Cinebench 9.5 | |
CPU | 1280x720 |
Quad Opteron 880 2.4 | 1720 |
Dual Quad Xeon E5345 2.33 | 1686 |
Dual DC Xeon 5160 3.0 | 1456 |
Quad Xeon E5345 2.33 | 1272 |
Quad DC Xeon 7130M 3.2 | 1169 |
Dual Opteron 880 2.4 | 1121 |
Dual DC Xeon 5060 3.73 | 1079 |
Dual DC Xeon 7130M 3.2 | 889 |
Four 2.4GHz Opteron cores are a bit slower than four 2.33GHz Xeons, but when we look at the eight core scores the Opteron is a bit faster. Again, it seems that the Opteron system scales better.
Cinebench 9.5 (32 bit) Per core performance |
|||
CPU | Quad core | Octal core | Scaling 4->8 |
Xeon 7130 3.2 GHz | 889 | 1272 | 43% |
Xeon 5345 2.33 GHz | 1169 | 1686 | 44% |
Opteron 880 2.4 GHz | 1121 | 1720 | 53% |
Opteron 890 2.8 GHz | 1297 | 1990 | 53% |
Xeon 5160 3 GHz | 1456 | N/A | N/A |
. | |||
Xeon Scaling 2.33 -> 3 GHz | 25% | ||
Opteron 880 vs. Quad core Xeon 2.33 GHz | -4% | 2% | 21% |
Why do we analyze this in so much detail? Cinebench, like most renders, couldn't care less about the memory subsystem. We tested our Clovertown system with two or four memory channels and the results were exactly the same. Therefore, we are pretty sure the slightly worse scaling of the Xeon E5345 is not a result of limited bandwidth or higher latency. There must be something else that limits scalability, and that something else is most likely cache coherency.
Cinebench 9.5 (32 bit) Per socket performance |
|
CPU | Dual Socket |
Quad core Xeon 2.33 GHz vs. Xeon 5160 | 16% |
Quad core Xeon 2.33 GHz vs. Opteron 880 | 50% |
Quad core Xeon 2.33 GHz vs. Opteron 890 | 30% |
Cinebench is popular because it is an easy benchmark, but 3dsmax is a very popular application. We tested with 3dsmax version 9, which has been improved to work better with multi-core systems. We used the "architecture" scene, which has been our favorite benchmarking scene for years. All tests were done with 3dsmax's default scanline renderer, SSE enabled and we rendered at HD 720p resolution. We measure the time it takes to render frames 20 to 22.
3DS Max 9 Architecture | |
CPU | 1280x720 |
Quad Opteron 880 2.4 | 273 |
Dual Quad Xeon E5345 2.33 | 308 |
Dual DC Xeon 5160 3.0 | 309 |
Quad Xeon E5345 2.33 | 392 |
Dual DC Xeon 5060 3.73 | 419 |
Quad DC Xeon 7130M 3.2 | 443 |
Dual Opteron 880 2.4 | 454 |
This cannot be a coincidence anymore: a single Xeon E5345 leaves the dual Opteron 880 far behind, but a dual Xeon E5345 trails the quad Opteron. It is not only the application that matters; the dataset has an impact too. Take a look at the table below where rendered at 720p and 480p resolution.
3DS Max 9 Architecture | |||
CPU | 720x480 | 1280x720 | |
Quad Opteron 880 2.4 | 137 | 273 | |
Dual Quad Xeon E5345 2.33 | 138 | 308 | |
Dual DC Xeon 5160 3.0 | 133 | 309 | |
Quad Xeon E5345 2.33 | 167 | 392 | |
Dual DC Xeon 5060 3.73 | 188 | 419 | |
Quad DC Xeon 7130M 3.2 | 201 | 443 | |
Dual Opteron 880 2.4 | 196 | 454 | |
. | |||
Scaling Opteron 880 | 43% | 66% | |
Scaling Xeon E5345 | 21% | 27% |
As you can see, the resolution at which you normally render determines how much you benefit from eight cores. Using an octal core machine to render relatively low resolution movies is like driving a potent 8 cylinder engine in a crowded city: all the horsepower goes to waste as you accelerate for a short period and then hit the brakes when approaching a red light. The same is true for rendering: unless you are rendering a complex scene at high resolution, the multi-core engine can never show its full potential. Thanks to better scaling, the quad Opteron platform has still a small advantage.
3DSMax 9 (32 bit) Per socket performance |
|
CPU | Dual Socket |
Quad core Xeon 2.33 GHz vs. Xeon 5160 | 0% |
Quad core Xeon 2.33 GHz vs. Opteron 880 | 47% |
Quad core Xeon 2.33 GHz vs. Opteron 890 | 27% |
However, when it comes to price/performance, it is not the quad core Xeon or the Opteron that wins, but most likely the Xeon 5160. It is more flexible as it will outperform the quad core Xeon in any scene that is not as complex as architecture and resolutions that are lower than 720p. Only if your scenes use radiosity lighting can we see a clear advantage for using the quad core Xeon. We noticed that the Xeon was up to 40% faster in such scenes.
15 Comments
View All Comments
zsdersw - Friday, December 29, 2006 - link
Smithfield/Paxville is a MCM chip (two pieces of silicon in one package), as well.
Khato - Wednesday, December 27, 2006 - link
Agreed on it being quite the good review, save for the lack of power consumption numbers/analysis. Form factor and power consumption can be just as important as the performance when the application can be spread across multiple machines, now can't it? At the very least, it would be nice to link to the power consumption numbers for the opteron platform in the first review it showed up in (which puts the dual clovertown at 365W load, while the quad 880 is supposedly 657W load.)rowcroft - Wednesday, December 27, 2006 - link
Loved the article, great job.I'm in the process of purchasing two dual quad core servers for VMWare use. Looking at the cost to performance analysis, it would be worth mentioning that many of the high end applications are licensed on a per socket basis. This alone is saving us $20,000 on our VMWare license and making it a compelling solution.
I would love to see more of this type of article as well- very interesting and not something you can easily find elsewhere on the net. (Tom's hardware reviewed the chip running XP Pro!)
duploxxx - Friday, December 29, 2006 - link
If you think that reading this review will help you to decide what to buy as VMWARE base you are going the wrong way! Yes these small tests are in favor for the new MCW architecture as we saw before and since haevy workload seems hard to test for some sites like anand! keep in mind that VMWARE is heavy workload, you combine the cpu and ram to whatever you want, guess what the fsb can't be combined like you wish!thinking that a 2x quad will outperform the 4p opteron is a big laugh! the fsb will kill youre whole ESX instantly from 4+ os on your system with normal load.
the money you save is indeed for sure, the power you loose is an other thing!
friendly info from a certified esx 3.0 beta tester :)
Viditor - Wednesday, December 27, 2006 - link
Probably one of your most thorough and well-rounded articles Johan...many thanks!It was nice to see you working with large (16GB) memory.
If you do get a Socket F system, will you be updating the article?