NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On
by Anand Lal Shimpi & Brian Klug on February 24, 2013 3:00 PM ESTThe GPU
Tegra 4 features an evolved GPU core compared to Tegra 3. The architecture retains a fixed division between pixel and vertex shader hardware, making it the only modern mobile GPU architecture not to adopt a unified shader model.
I already described a lot of what makes the Tegra 4 GPU different in our original article on the topic. The diagram below gives you an idea of how the pixel and vertex shader hardware grew over the past 3 generations:
We finally have a competitive GPU architecture from NVIDIA. It’s hardly industry leading in terms of specs, but there’s a good amount of the 80mm^2 die dedicated towards pixel and vertex shading hardware. There's also a new L2 texture cache that helps improve overall bandwidth efficiency.
The big omission here is the lack of full OpenGL ES 3.0 support. NVIDIA’s pixel shader hardware remains FP24, while the ES 3.0 spec requires full FP32 support for both pixel and vertex shaders. NVIDIA also lacks ETC and FP texture support, although some features of ES 3.0 are implemented (e.g. Multiple Render Targets).
Mobile SoC GPU Comparison | |||||||||||||||
GeForce ULP (2012) | PowerVR SGX 543MP2 | PowerVR SGX 543MP4 | PowerVR SGX 544MP3 | PowerVR SGX 554MP4 | GeForce ULP (2013) | ||||||||||
Used In | Tegra 3 | A5 | A5X | Exynos 5 Octa | A6X | Tegra 4 | |||||||||
SIMD Name | core | USSE2 | USSE2 | USSE2 | USSE2 | core | |||||||||
# of SIMDs | 3 | 8 | 16 | 12 | 32 | 18 | |||||||||
MADs per SIMD | 4 | 4 | 4 | 4 | 4 | 4 | |||||||||
Total MADs | 12 | 32 | 64 | 48 | 128 | 72 | |||||||||
GFLOPS @ Shipping Frequency | 12.4 GFLOPS | 16.0 GFLOPS | 32.0 GFLOPS | 51.1 GFLOPS | 71.6 GFLOPS | 74.8 GFLOPS |
For users today, the lack of OpenGL ES 3.0 support likely doesn’t matter - but it’ll matter more in a year or two when game developers start using OpenGL ES 3.0. NVIDIA is fully capable of building an OpenGL ES 3.0 enabled GPU, and I suspect the resistance here boils down to wanting to win performance comparisons today without making die size any larger than it needs to be. Remembering back to the earlier discussion about NVIDIA’s cost position in the market, this decision makes sense from NVIDIA’s stance although it’s not great for the industry as a whole.
Tegra 4i retains the same base GPU architecture as Tegra 4, but dramatically cuts down on hardware. NVIDIA goes from 4 down to 3 vertex units, and moves to two larger pixel shader units (increasing the ratio of compute to texture hardware in the T4i GPU). The max T4i GPU clock drops a bit down to 660MHz, but that still gives it substantially more performance than NVIDIA’s Tegra 3.
Memory Interface
The first three generations of Tegra SoCs had an embarrassingly small amount of memory bandwidth, at least compared to Apple, Samsung and Qualcomm. Admittedly, Samsung and Qualcomm were late adopters of a dual-channel memory interface, but they still got there much quicker than NVIDIA did.
With Tegra 4, complaints about memory bandwidth can finally be thrown out the window. The Tegra 4 SoC features two 32-bit LPDDR3 memory interfaces, bringing it up to par with the competition. The current max data rate supported by Tegra 4’s memory interfaces is 1866MHz, but that may go up in the future.
Tegra 4 won’t ship in a PoP (package-on-package) configuration and will have to be paired with external DRAM. This will limit Tegra 4 to larger devices, but it should still be able to fit in a phone.
Unfortunately, Tegra 4i only has a single channel LPDDR3 memory interface. Tegra 4i on the other hand will be available in PoP as well as discrete configurations. The PoP configuration may top out at LPDDR3-1600, while the discrete version can scale up to 1866MHz and beyond.
75 Comments
View All Comments
tipoo - Sunday, February 24, 2013 - link
Under 500 in Sunspider, about twice as fast as anything else ARM. But then again, it's a few months newer than that, and actually still not shipping. And as usual with Nvidia they're early to each party (first to dual core, first to quad core), but not always the best performing. We'll see if other Cortex A15 designs beat it.I'd love to see four of those cores paired with SGXs upcoming 600/Rogue series.
jeffkibuule - Sunday, February 24, 2013 - link
SunSpider is so software sensitive that a Tegra 3 @ 1.2 Ghz on Windows RT beats a Snapdraon S4 Pro @ 1.5Ghz on Nexus 4 using Chrome. It's a terrible benchmark because its so dependent on underlying kernel optimizations in the Android phone market.tipoo - Sunday, February 24, 2013 - link
True, other benchmarks are similarly impressive though.karasaj - Sunday, February 24, 2013 - link
Psh it has nothing on my desktop! 125ms on sunspider... Nvidia so behind.Anyways, still looks impressive. I really want to see some Krait 600/800 benchmarks.
tipoo - Sunday, February 24, 2013 - link
The fact that they're getting well below an order of magnitude slower than desktops is impressive in itself too. Even with iPad 2 level performance I still was reluctant to do most of my web browsing on a tablet for the performance. Maybe with Tegra 4 and beyond hardware speed that will change.Mumrik - Sunday, February 24, 2013 - link
As someone with heavily tabbed browsing habits, I don't think I'll ever make that jump (and I own a tablet).tipoo - Sunday, February 24, 2013 - link
Also true, that's my other thing. I like to open a bunch of background tabs and have them ready as I go through each one. Right now, tablets don't do background loading, as far as I know, and if they did they wouldn't be powerful enough to keep the main tab smooth while doing it.Tarwin - Monday, February 25, 2013 - link
Tablets DO do background loading, as long as they're android. The only performance I've seen is from lack of RAM on my phone and lack of bandwidth on the phone and tablet but those things affect any computer as well. One observation to ne made, they do load in the background but things like audio and video playback will pause if you switch to another tab.von Krupp - Monday, February 25, 2013 - link
Even Windows Phone 7.5 and 8 do background loading. I haven't used it, but I'd wager that RT does as well, if even the gimpy mobile OS can.tuxRoller - Sunday, February 24, 2013 - link
As someone who had, until recently, over 40 tabs open on my chrome browser (Nexus 4), the critical problem has been memory. With enough memory, and good enough task management, these problems tend to go away.Of course, maybe you are than 0.00001% who has hundreds or thousands of tabs open in which case I pity any computer you are likely to own.