ARM A53/A57/T760 investigated - Samsung Galaxy Note 4 Exynos Review
by Andrei Frumusanu & Ryan Smith on February 10, 2015 7:30 AM EST20nm Manufacturing Process
Both Samsung Semiconductor and TSMC delivered their first 20nm products in Q3 2014, but they don't represent the same jump in efficiency. Samsung's 28nm HKMG process varied a lot from TSMC's 28nm HPM process. While Samsung initially had a process lead with their gate-first approach when introducing 32nm HKMG and subsequently the 28nm shrink, TSMC went the route of gate-last approach. The advantage of the gate-last approach is that it allows for lower variance in the manufacturing process and being able to allow for better power characteristics. We've seen this as TSMC introduced the highly optimized HPM process in mobile. Qualcomm has been the biggest beneficiary as they've taken full advantage of this process jump with the Snapdragon 800 series as they moved from 28nm LP in previous SoCs.
In practical terms, Samsung is brought back on even terms with TSMC in terms of theoretical power consumption. In fact, 28nm HPM still has the same nominal transistor voltage as Samsung's new 20nm process.
Luckily Samsung provides useful power modeling values as part of the new Intelligent Power Allocation driver for the 5422 and 5430 so we can get a rough theoretical apples-to-apples comparison as to what their 20nm process brings over the 28nm one used in their previous SoCs.
I took the median chip bin for both SoCs to extract the voltage tables in the comparison and used the P=C*f*V² formula to compute the theoretical power figure, just as Samsung does in their IPA driver for the power allocation figures. The C coefficient values are also provided by the platform tables.
We can see that for the A15 cores, there's an average 24% power reduction over all frequencies, with the top frequencies achieving a good 29% reduction. The A7 cores see the biggest overall voltage drop, averaging around -125mV, resulting in an overall 40% power reduction and even 56% at the top frequency. It's also very likely that Samsung has been tweaking the layout of the cores for either power or die size; we've seen this as the block sizes of the CPUs have varied a lot between the 5410, 5420 and 5422, even though they were on the same process node.
While these figures provide quite a significant power reduction by themselves, they must be put into perspective with what Qualcomm is publishing for their Krait cores. The Snapdragon 805 on a median speed bin at 2.65GHz declares itself with a 965mW power consumption, going down to 57mW at 300MHz. While keeping in mind that these figures ignore L2 cache power consumption as Qualcomm feeds this on a dedicated voltage rail, it still gives us a good representation of how efficient the HPM process is. The highest voltages on the S805 are still lower than the top few frequencies found on both the 5430 and the 5433.
20nm does bring with itself a big improvement in die size. If we take the 5420 as the 28nm comparison part and match it against the 5430, we see a big 45% decrease on the A7 core size, and an even bigger 64% reduction on the A15 core size. The total cluster sizes remain relatively conservative in their scaling while shrinking about 15%; this is due to SRAM in the caches having a lower shrinking factor than pure logic blocks. One must keep in mind that auxiliary logic such as PLLs, bus interfaces, and various other small blocks are part of a CPU cluster and may also impact the effective scalability. Samsung also takes advantage of artificially scaling CPU core sizes to control power consumption, so we might not be looking at an apples-to-apples comparison, especially when considering that the 5430 is employing a newer major IP revision of the CPU cores.
Exynos 5420 vs Exynos 5430 block sizes | ||||
Exynos 5420 | Exynos 5430 | Scaling Factor | ||
A7 core | 0.58mm² | 0.4mm² | 0.690 | |
A7 cluster | 3.8mm² | 3.3mm² | 0.868 | |
A15 core | 2.74mm² | 1.67mm² | 0.609 | |
A15 cluster | 16.49mm² | 14.5mm² | 0.879 |
The Mali T628 between the 5420 and the 5430 actually had an increase in die size despite the process shrink, but this is due to a big increase in the cache sizes.
Samsung regards their 20nm node as very short-lived and the 5430 and 5433 look to be the only high volume chips that will be coming out on the process as their attention is focused on shipping 14nm FinFET devices in the next few months. In fact at the Samsung Investor Forum 2014 they announced mass production of a new high-end SoC has already begun mid-November and will be ramping up to full volume in early 2015. I suspect this to be the Exynos 7420 as that is the successor SoC to the 5433.
All in all, the argument that this 20nm chip should be more power efficient than the competitors' 28nm is not completely factual and doesn't seem to hold up in practice. The process still seems young and unoptimized compared to what TSMC offers on 28nm.
Before we get to the performance and power figures, I'm handing things over to Ryan as we take a look at the architectural changes, starting with an analysis of the Cortex A53.
135 Comments
View All Comments
ddriver - Tuesday, February 10, 2015 - link
I'd like to see A57 performance without being so crippled by a ram bottleneck.blanarahul - Wednesday, February 11, 2015 - link
Loved this article. Only thing missing was gaming fps and power consumption comparison b/w LITTLE cluster only, big cluster only and big.LITTLE modes.ddriver - Thursday, February 12, 2015 - link
Also in true 64bit mode, cuz a lot of the perf improvements in v8 are not available in legacy 32bit mode.It is a shame really, samsung decided the uArch improvements would be enough to barely pass this chip as "incremental", they didn't bother to feed a higher throughput chip with a wider memory bus. As much as it pains me, apple did better in that aspect by not crippling their A7 chip, even if only because they needed it for a "wow factor" after so many generations of mediocre hardware, especially given the many exclusive initial shipment deals they secured to stay relevant.
thegeneral2010 - Wednesday, February 18, 2015 - link
i like wat u say and i really like to see note 4 running on 64bit this would give samsung processors a great push forward and trust of consumers.bigstrudel - Tuesday, February 10, 2015 - link
If it wasn't completely obvious already:Apple A Series stands alone years ahead of the rest of the pack.
Flunk - Tuesday, February 10, 2015 - link
But if they don't sell it to anyone else, it doesn't really matter does it?Apple doesn't compete with Samsung or Qualcomm when it comes to selling SoCs because they don't sell SoCs to other companies. A slight lead in CPU performance is not going to get people to buy an iPhone over and Android, if that's what they're set on buying.
xype - Tuesday, February 10, 2015 - link
It does matter insofar as to be a benchmark of what is possible (as long as they are ahead). And let’s not pretend Apple’s CPUs sucking wouldn’t invite the same kind of comments—just like every situation where 2 competing technologies are compared.Platform/fanboy trolling aside, that’s something Android users benefit from as well. Apple being "stubborn" about 2 core CPUs, for example, is a nice counterweight to the 8 cores and 8 mini-cores and 8 quasi-cores trend that some CPU vendors seem to have a hard-on for, and it gives a nice real-world example of how such an approach to mobile CPU design works out, no?
If Apple stays ahead in the mobile CPU game, the people using non-Apple phones will always have a target to point to and demand equality with. Otherwise they’d just have to live with whatever Qualcomm et al feed them.
bigstrudel - Tuesday, February 10, 2015 - link
My comment isn't fanboy jingo-ism. Its fact.There's not a single Android ARM core on the market that can even match the power of the Apple A7's Cyclone cores much less A8's 2nd gen design.
Were still waiting for anything custom to come out of the Android camp aside from the frankensteinish design of Nvidia's Denver core.
I really shouldn't need to explain why to people on Anandtech.
ergo98 - Tuesday, February 10, 2015 - link
The Tegra K1 64 bit is faster, core per core, versus the A8 (you do realize that the K1-64 has only 2 cores, right? I'm going to have to guess no, or you just are completely unable to read a chart). The A8x offers marginal per core performance advantages over the A8, and the primary benefit is the third core. The K1 64 is a A57 derivative, *exactly like the A8*.Your comments can only be construed as trolling. Can't match the A7? Give me a break.
tipoo - Tuesday, February 10, 2015 - link
Ergo, you're completely off. The Denver K1 is a VLIW code morphing architecture - it has nothing to do with the Cortex A57, nor does the Apple Cyclone, they're both custom architectures.The K1 offers better performance in benchmarks, but as a result of code morphing, it can be hit or miss in real world, causing jank.