Updates to Skylake Discrete Graphics Performance: PCIe Optimizations Incoming
by Ian Cutress on September 8, 2015 5:00 AM ESTIn our initial review of the two 6th Generation Intel Skylake-K processors launched on August 5th, the i7-6700K and the i5-6600K, our comparative analysis to the previous generations of Intel processors was for the most part, positive. On the whole, clock-for-clock performance was a marginal increase over previous generations but the cumulative end-to-end effort of several generations of upgrades, plus for those that overclock, gave a substantial reason for those in CPU limited workloads to find an upgrade (along with benefits on the chipset and DRAM side as well). However, one element of the equation was puzzling at the time – the performance of games using discrete graphics cards was marginally lower with the new platform compared to older platforms when looking at average frame rates.
Discrete Graphics Performance: Before
During our testing, it is not uncommon to see two platforms that perform similarly to have a reasonable margin of error, often ±1%, due to variations in pre-initialised cache structures, or in the case of games like GRID that rely on a random sequence to provide the end-result numbers. Despite this, we noticed that for Skylake-K we saw consistent drop in our discrete GPU testing, often around the -1% to -3% mark but sometimes as low as -5% or -7% when we compared it to both Intel’s 5th Generation (Broadwell) and 4th Generation (Haswell). Other websites such as The Tech Report also noted these results, placing Broadwell’s numbers at the top of the stack (if only marginal). Some commentary at the time focused on Broadwell’s use of eDRAM in the desktop components which can aid performance while retaining a frequency deficit, although given our analysis of the eDRAM in Broadwell as a victim cache rather than a transparent DRAM cache it seems less likely that this is the case, plus we also now have new information coming post launch about this issue. But if we remove Broadwell as a special case, it was still concerning that the i7-6700K lagged behind the i7-4770K despite being higher in frequency and clock-for-clock performance.
Before it came time to publish our Skylake review, we performed our initial analysis and ended up with our results. Whenever the results are worse than expected, we typically discuss with the manufacturer regarding any anomalies and if they can account for them (or something doesn’t seem to be configured properly). So we passed on our data to Intel as well as ASUS due to our setup at the time, and did not hear anything back for a number of weeks except the odd whisper of ‘we are looking in to it’. Then, in our meeting with Intel at the Intel Developer Forum in mid-August, an Intel processor engineer said that they were still working on it internally, but from their testing it seems that one of the registers controlling an internal frequency was not being set properly during start-up – as in not being set to Intel’s recommended value.
Another couple of weeks later, we were contacted by ASUS who shed a lot more light on the issue. The register in question is called the FCLK (or ‘f-clock’), which controls some of the cross-frequency compensation mechanisms between the ring interconnect of the CPU, the System Agent, and the PEG (PCI Express Graphics). Basically this means it is to do with data from the processor to the GPUs. So when data is handed from one end to another, this element of the processor manages the data buffers to allow that cross boundary migration in a lossless way. This is a ratio frequency setting which is tied directly to the base frequency of the processor (the BCLK, typically 100 MHz), and can be set at 4x, 8x or 10x for 400 MHz, 800 MHz or 1000 MHz respectively.
FCLK is in the top left, between the CPU and the PCIe lanes.
The default value of the FCLK is at 800 MHz for both mobile and desktop Skylake processors, and it is this value that all the motherboard manufacturers have validated their systems on – such as overclocking and margins due to external environmental factors. However, the Intel recommended value for desktops, as dictated in their ‘tuning guide’ for motherboard manufacturers was 1000 MHz, or the 10x ratio setting. The recommended value for laptops is still the 8x ratio setting.
So going back to Skylake-K launch on the 5th of August – it is our understanding that Intel moved the launch of these processors from IDF (mid-August) to Gamescom to coincide with their push towards a gaming focused platform. So despite the fact that between Gamescom and IDF the only people who really had these processors were other media and a few system integrators selling pre-built systems, everything had to be ready to go at that time. But at this time, the 10x ratio setting in Intel’s microcode (MRC) was not functioning as expected when motherboard manufacturers tried to initialise it during start-up. As a result, the ‘default’ value was used universally.
Discrete Graphics Performance: After
Fast forward to mid-August, and firmware update 1168 from Intel now allows motherboard manufacturers to implement the 10x setting for FCLK at POST. This means that the motherboard manufacturers now have to implement that firmware into their BIOS packages and request that all owners upgrade in order to benefit from this change.
From what we are being told by ASUS, they will have it enabled by default (at stock) on version 0801 on the Z170-A, with 090x versions of the BIOS providing a manual option inside the Tweakers’ Paradise sub-menu. ASRock by comparison, on the 1.70 BIOS for the Extreme7+, has an option to adjust the FCLK in the CPU configuration menu, but sets 800 MHz as default and requires adjusting to 1 GHz to make the change. For motherboard manufacturers, this new change (if they want to implement it by default) requires a complete verification process to make sure everything else in the system works, all PCIe cards are properly validated and added to their QVLs, and also overclocking margins are still as advertised. That being said though, we have been told to be wary of exact benefits from the new firmware, as some internal testing has shown not that big a jump in most instances.
On the overclocking side, if a user leaves the FCLK setting at auto but initiates as base frequency overclock (from 100 MHz to 120 MHz), then the start-up sequence on the motherboard should be able to take this into account and move from the 10x ratio to the 8x ratio, giving 8 * 120 = 960 MHz, making it closer to the 1000 MHz value. This can be overridden of course, but our sources say that FCLK can be adjusted to around 1400 MHz before it starts to fail, meaning that this ‘test at start-up’ procedure has to take the BCLK into account.
Interestingly enough, this register that adjusts the FCLK ratio can be probed and changed at run-time as well, in the middle of the operating system. That lends itself to some interesting dilemmas if software detects its presence and tries to manually adjust it when a system is BCLK overclocked. We might see some software adjust this automatically (look out for general performance increase claims on Skylake only), so our hope here is that the software is also able to probe the BCLK and find the most appropriate ratio to avoid instability. Obviously this matters more for those on motherboards that still run at 8x or who have manually set their own ratio.
But does it make much of a difference? We ran our GPU suite with the processor at stock, the new BIOS, and it set explicitly to 1 GHz. The quick answer to the question is yes – it makes a subtle difference.
*For full disclosure, our initial review contained erroneous results on the GTX 770 and Shadow of Mordor due to an unknown reason. For our retest, this benchmark at 1080p Ultra and 4K Ultra was re-run at 800 MHz FCLK and our numbers in Bench are updated. The change in this benchmark result does not affect our conclusion in the initial review or this secondary set of testing.
The overall results showed an increase in frame rates by 1.3% from both the Haswell (i7-4770K) and Broadwell (i7-5775C) processors. The key takeaway is that in almost every scenario where performance was worse (except the GTX 980 against the i7-5775C), the new FCLK setting makes a change. Across the board, moving from an 800 MHz value on FCLK to 1 GHz gave an increase no matter what the discrete graphics test. Previously where our individual benchmarks ranged from zero change through -1%, -2% down to -7%, most results move up slightly, usually under 1%, but some get a boost as high as 5%.
So this raises a number of questions.
Q: Firstly, does this mean our initial review results are now invalid?
A: It depends what you mean by testing the stock processor. Is it how it performs out of the box, or is it how it performs to Intel’s ‘recommended’ tuning profile. As mentioned above, it seems like motherboard manufacturers will act differently on the FCLK issue, whereas some might enable it by default and others require the user to implement the change. The fact that the 800 MHz is main setting for mobile platforms (Skylake-Y and Skylake-U, perhaps even Skylake-H) means little on the desktop, but we might be in a period of transition for motherboards as the cycle progresses. At this time, our review as posted should still be the performance out of the box, but in time everyone should migrate to the 1 GHz setting for desktop. I will add the data into Bench to act as a comparison between the two, and in time retire the older set of data.
Q: Is Broadwell still the ‘preferred’ processor by a number of journalists in terms of performance in gaming due to the eDRAM, given that this minor change produces a different result?
A: From AnandTech’s perspective, this does not change much on our side – the Skylake platform still offers access to new features such as DDR4, the Z170 chipset with increased PCIe storage, USB 3.1 controllers on most boards >$150 and a cumulative generational increase in performance over the last few years. Going back to our architecture deep-dive, it also affords better performance in certain benchmarks such as Hybrid x265 than Broadwell due to its ability to keep more load/store operations in flight. There will be benchmarks that enjoy the eDRAM, such as WinRAR and a couple of our Linux-Bench server tests, and the fact that the Broadwell with eDRAM competes with a slower frequency is an interesting exercise in cache implications for performance. But for discrete gaming it is pretty much par for the course. Arguably, the i7-6700K should be easier to get hold of over time compared to the i7-5775C as well.
Q: Does anything change in the CPU benchmarks/performance?
A: As long as it doesn't touch the PCIe bus/routing, then there is no difference to the operation.
Q: I am about to invest in a Skylake desktop/I have a Skylake desktop. What should I do?
A: If your system is working fine, it might be best to leave it as is for now, at least until all the motherboard manufacturers have had a chance to go through a number of updates and tweak the setting to their satisfaction. If you need to be on the bleeding edge and feel like updating your BIOS, do so and explicitly adjust the FCLK to 1 GHz, rather than leaving it on automatic. However, remember that this value is tied to the BCLK (base frequency), and treat this setting like an overclock. This means running a usual range of stability benchmarks. We tried the setting on a couple of motherboards, and it is still a little rough (crashes in a benchmark or two from time to time), which I imagine is down to the manufacturers needing to fine tune this setting (either internal voltages, or skew balancing). At this point in time we are under the impression that every Skylake-S processor should be able to run at this higher ratio when at stock speeds. As always, in overclocked environments, your mileage may vary.
Q: Will this affect the Skylake Xeons?
A: At this point, we do not know. However, given how early we are in the Skylake launch cycle, and that Intel has stated a Q4/Q1 release for the E3 v5 Xeons, we would expect it to be a non-issue when they are launched.
37 Comments
View All Comments
ImSpartacus - Tuesday, September 8, 2015 - link
Excellent follow up.Many of us weren't expecting a 10+% performance bump in sky lake gaming, but we expected the tried-&-true 0-5% annual performance increase. It's good to hear the architecture might be able to provide that.
MrSpadge - Tuesday, September 8, 2015 - link
+1This might also help GP-GPU apps which may be sensitive to the latency between CPU and GPU.
cactusdog - Tuesday, September 8, 2015 - link
I'm a little disappointed with Skylake. Clock for clock performance seems worse than my 4790K. Even memory performance is worse, my DDR3 @2133Mhz if faster (in MB/s) than DDR4 @2133Mhz, and DDR4 has much worse latency. I didnt expect a huge improvement with Skylake but I dont want to go backwards in performance.Ian Cutress - Tuesday, September 8, 2015 - link
You're comparing an overclocked DRAM setting to a stock DRAM setting. The CAS Latency also governs the rate of data transfer, so you are probably comparing a CL of 10 (DDR3-2133 C10) to a CL of 15 (DDR4-2133 C15), so no wonder it comes out as slower. The more apt comparison would be DDR3-1600 C11 to DDR4-2133 C15, or some overclocked DDR4.Check out our main 6700K/6600K Skylake review for more info, with IPC tests and DRAM comparison metrics:
www.anandtech.com/show/9483/intel-skylake-review-6700k-6600k-ddr4-ddr3-ipc-6th-generation
There are plans for a Skylake DRAM scaling article when a few other items are off my desk.
Timbrelaine - Tuesday, September 8, 2015 - link
I'd love to read that. We don't often get new memory technologies, and it would be great if Skylake benefited meaningfully from the higher transfer speeds DDR4 will consistently reach once it matures.DanNeely - Tuesday, September 8, 2015 - link
I wouldn't expect anything dramatic, the P4 was the last Intel CPU that was RAM bottlenecked in normal use.If anything saw more than a 1 or 2% difference it'd probably be the IGP, because graphics primarily uses streaming data access - making the throughput gains from DDR4 more important than the looser timings needed to support the faster data rate - and because the IGP is, unlike the rest of the CPU, badly memory starved on higher end models.
bug77 - Wednesday, September 9, 2015 - link
Sounds about right. Ever since they moved the memory controller onto the CPU die (Core series), Intel did a very good job hiding memory latency behind their three levels of cache. Save for some specific workloads, it doesn't really matter what memory you're running. Which is why I tend to hunt for power voltage instead of higher frequency when shopping for RAM.Oxford Guy - Tuesday, September 8, 2015 - link
"You're comparing an overclocked DRAM setting to a stock DRAM setting."So? Few people use JEDEC spec RAM with overclocked processors.
DDR4 seems to offer something that primarily appeals to the server market: lower voltage (but with high latency). For desktop users it's definitely worthwhile to ask what's the point.
Klimax - Wednesday, September 9, 2015 - link
You might want to compute actual latency of various options in ns then to compare them just on raw CL number. You'll be surprised...Oxford Guy - Wednesday, September 9, 2015 - link
How? CAS 11 or 12 DDR3 2400 vs. CAS 15 or 16 DDR4 2400...