The AMD FX (Bulldozer) Scheduling Hotfixes Tested
by Anand Lal Shimpi on January 27, 2012 12:47 PM ESTThe basic building block of Bulldozer is the dual-core module, pictured below. AMD wanted better performance than simple SMT (ala Hyper Threading) would allow but without resorting to full duplication of resources we get in a traditional dual core CPU. The result is a duplication of integer execution resources and L1 caches, but a sharing of the front end and FPU. AMD still refers to this module as being dual-core, although it's a departure from the more traditional definition of the word. In the early days of multi-core x86 processors, dual-core designs were simply two single core processors stuck on the same package. Today we still see simple duplication of identical cores in a single processor, but moving forward it's likely that we'll see more heterogenous multi-core systems. AMD's Bulldozer architecture may be unusual, but it challenges the conventional definition of a core in a way that we're probably going to face one way or another in the not too distant future.
A four-module, eight-core Bulldozer
The bigger issue with Bulldozer isn't one of core semantics, but rather how threads get scheduled on those cores. Ideally, threads with shared data sets would get scheduled on the same module, while threads that share no data would be scheduled on separate modules. The former allows more efficient use of a module's L2 cache, while the latter guarantees each thread has access to all of a module's resources when there's no tangible benefit to sharing.
This ideal scenario isn't how threads are scheduled on Bulldozer today. Instead of intelligent core/module scheduling based on the memory addresses touched by a thread, Windows 7 currently just schedules threads on Bulldozer in order. Starting from core 0 and going up to core 7 in an eight-core FX-8150, Windows 7 will schedule two threads on the first module, then move to the next module, etc... If the threads happen to be working on the same data, then Windows 7's scheduling approach makes sense. If the threads scheduled are working on different data sets however, Windows 7's current treatment of Bulldozer is suboptimal.
AMD and Microsoft have been working on a patch to Windows 7 that improves scheduling behavior on Bulldozer. The result are two hotfixes that should both be installed on Bulldozer systems. Both hotfixes require Windows 7 SP1, they will refuse to install on a pre-SP1 installation.
The first update simply tells Windows 7 to schedule all threads on empty modules first, then on shared cores. The second hotfix increases Windows 7's core parking latency if there are threads that need scheduling. There's a performance penalty you pay to sleep/wake a module, so if there are threads waiting to be scheduled they'll have a better chance to be scheduled on an unused module after this update.
Note that neither hotfix enables the most optimal scheduling on Bulldozer. Rather than being thread aware and scheduling dependent threads on the same module and independent threads across separate modules, the updates simply move to a better default cause of scheduling on modules first. This should improve performance in most cases but there's a chance that some workloads will see a performance reduction. AMD tells me that it's still working with OS vendors (read: Microsoft) to better optimize for Bulldozer. If I had to guess I'd say that we may see the next big step forward with Windows 8.
AMD was pretty honest when it described the performance gains FX owners can expect to see from this update. In its own blog post on the topic AMD tells users to expect a 1 - 2% gain on average across most applications. Without any big promises I wasn't expecting the Bulldozer vs. Sandy Bridge standings to change post-update, but I wanted to run some tests just to be sure.
The Test
Motherboard: | ASUS P8Z68-V Pro (Intel Z68) ASUS Crosshair V Formula (AMD 990FX) |
Hard Disk: | Intel X25-M SSD (80GB) Crucial RealSSD C300 |
Memory: | 2 x 4GB G.Skill Ripjaws X DDR3-1600 9-9-9-20 |
Video Card: | ATI Radeon HD 5870 (Windows 7) |
Video Drivers: | AMD Catalyst 11.10 Beta (Windows 7) |
Desktop Resolution: | 1920 x 1200 |
OS: | Windows 7 x64 SP1 w/ BD Hotfixes |
79 Comments
View All Comments
Beenthere - Friday, January 27, 2012 - link
The Hot Fix is better than No Fix and Win 8 beta looks to be a few percent better than the Win 7 Hot Fix. So it's all good but nothing startling. Combined with Vishera/Piledriver should provide a nice performance bump however.Ramon Zarat - Friday, January 27, 2012 - link
"The Hot Fix is better than No Fix".No...
A fix than bring no additional value (less than 2%, well below statistical significant value), and at the same time, by its simple presence, introduce the potential of conflict and instability (simple law of entropy) can only be detrimental to a system, not positive.
Another way to look at it: If my car is broken to the point of being unusable and I apply 1500$ worth of parts and labor and in the end it make no difference in its usability, why spend the 1500.00$ in the first place? New parts are better than no new parts?
I love AMD as a company, I really do. The truth is Bulldozer in conjecture with present operating system is broken and dysfunctional with embarrassing sub-par performance and inacceptable power consumption per instruction for what is supposedly a 8 cores CPU that cannot even approach a 4 cores 2500K performance in 90% of scenarios. AFAIK, that applies to ALL operating system including Linux and Mac. I even have my doubt a perfectly Bulldozer tuned OS would be able to compete with Intel offering. I'd like nothing better to be proved wrong on that one.
AMD knew from day 1 (years ago) that their new unconventional architecture would face issue such as this one, but it’s only now, months after the actual launch of the product that they are working on fixes that fix nothing? Great example of bad management / strategy / planning and lack of foresight from AMD. I hope they have learned a lesson because underdog can’t afford such huge misstep too often and hope to stay alive in the long run.
bigboxes - Friday, January 27, 2012 - link
But can I download more torrents with this fix?ThomasS31 - Friday, January 27, 2012 - link
Why you have not tested core usage / clock gating and power consumption?I thouht the main problem of scheduling related to that as well...
Lugaidster - Friday, January 27, 2012 - link
This!Is power consumption altered with this hotfix?
Cheers
MrSpadge - Sunday, January 29, 2012 - link
Power consumption for given "mixed workloads" should be altered.. increased, actually, since more modules are active. Overall energy consumption, on the other hand.. I don't know. More power draw but for a shorter time.MrS
Marlin1975 - Friday, January 27, 2012 - link
Anybody notice the differance between the first hotfix and the 2nd?The first one even helped some Intel CPUs. The 2nd one seems to not help either AMD or Intel as much???
saneblane - Friday, January 27, 2012 - link
I fail to see what was fixed, i give them credit for adding 2 more fps in x264 second pass. But come on this is just stupid. their is no performance increase that is going to be noticed or seen by this so called hot fix. We already accept the fact that bulldozer failed on the desktop in it's first showing, why remind us again. Things like this make people lose confidence in you, just lay low and try to improve Piledriver.Lonyo - Friday, January 27, 2012 - link
Because this won't probably help future AMD CPUs?It's something Microsoft will need to deal with, and possibly already have with Win 8. It's free performance, and it helps pretty much test future similarly equipped AMD CPUs.
What are they supposed to be doing, other than giving free performance gains to Bulldozer users?
saneblane - Friday, January 27, 2012 - link
well if you think the hotfix is worth wild, then i guess AMD has done it's job. Their is always more people that are easily fooled than the ones who can see the truth. Free performance??? Give me a freaking break, i waited years for bulldozer, and your going to tell me about a cpu that cost more than an i5 2500 and losses in almost every benchmark and games.That this hotix is free perfomance, haha. Zambezi users are paying a lot for this so called "free" when even i5 2500 beat the crap out of it, i guess sandy bridge has a lot of "free" performance too.