Applied Micro's X-Gene: The First ARMv8 SoC
by Anand Lal Shimpi on November 14, 2011 1:44 PM EST- Posted in
- CPUs
- Arm
- AppliedMicro
- X-Gene
- SoCs
We covered the X-Gene announcement a couple of weeks ago when the news was first made public. I was in London at the time meeting with Nokia so I didn't get a chance to sit down with Applied Micro's engineers to discuss the SoC and its architecture. Thankfully, upon my return, they gave me the opportunity to do just that.
We've been hearing about ARM based servers for a while now, but their advantages have always been lower power consumption than beefy x86 servers for lighter workloads. You always sacrifice performance and memory addressibility. APM hopes to change that with its X-Gene.
Development on X-Gene began three years ago. APM was originally a PowerPC house. The company was working on a 64-bit PowerPC core internally before meeting with ARM and eventually redirecting its efforts to a 64-bit ARM core. Together with ARM, APM started laying the foundation for ARM's first 64-bit instruction set - now known as ARMv8.
At a time when everyone else was working on ARMv7 cores, this gave APM a headstart on the ARMv8 transition. As of now there is no officially announced, licensable ARMv8 core from ARM itself. I believe this makes the X-Gene the world's first ARMv8 SoC.
At a high level the X-Gene is pretty beefy. Each CPU core can fetch and decode up to four ARMv8 (or eight Thumb) instructions per clock. APM wouldn't reveal the depth of the pipeline, but it is targeting a 3GHz operating frequency at 28/40nm so it's safe to say that the pipeline is fairly deep. APM did add that it's not quite as deep as the Pentium 4, but rather in the sweet spot. I'd take that to mean we're looking at something around or just shy of 20 stages for the integer pipeline.
APM wouldn't go into detail on the back end configuration of the X-Gene, nor would it comment on other intracacies like branch predictors or cache configuration. We can learn a lot from the front end alone though. Cortex A15 features a 3-issue front end, and moving to 4 implies a generational gap in IPC. Note that we saw a similar transition going from the P6/NetBurst eras to Intel's Conroe (aka Core 2) architecture.
As the X-Gene implements the ARMv8 ISA it is a full 64-bit architecture that is backwards compatible with 32-bit ARMv7. The CPU features hardware virtualization acceleration, MMU virtualization, advanced SIMD instructions and what APM is calling a "very sophisticated" FPU, although once again details were scarce.
Despite the aggressive architecture, each core is estimated to consume only 2W per core. Like most mobile SoCs, the entire chip will idle at around 300mW.
At the SoC level, APM plans to integrate many of these CPU cores onto a single package. The range is officially 2 - 128 cores, although I expect we'll see something more reasonable than the extremes. The SoC also features integrated SATA (up to six 6Gbps ports per SoC) and two 10GbE controllers.
Each SoC can feature up to four 72-bit DDR3 (64-bit + ECC) memory controllers, although lower core count configurations will have fewer memory controllers.
You can plop multiple SoCs down on a single board, connected by a coherent interface that can deliver up to 400Gbps of bandwidth between chips.
APM's performance estimates put a 3GHz X-Gene at roughly half the integer performance of a 2.4GHz Sandy Bridge. The X-Gene advantage however is the ability to integrate many more cores. APM expects a quad-core X-Gene will be able to perform similarly to a dual-core Sandy Bridge Xeon, but with much lower power consumption.
Update: APM has since pulled the slide it shared with us originally making the comparison to Intel's Sandy Bridge architecture. The implication being that its performance estimates may have been a bit too aggressive, only time will tell...
These are all estimates today. The first customer evaluation boards will be available in March 2012. The X-Gene SoCs on the eval boards will be delivered as FPGAs. The ASIC version for actual deployment won't hit until the second half of next year. The first chips will be built on a 40nm process to get them to market quickly and cost effectively, but the design is expected to transition to 28nm afterwards. At 40nm we may not see such aggressive clocks or tons of cores per SoC.
APM expects that even with a late 2012 launch it will have a 1 - 2 year lead on the competition. If it can get the X-Gene out on time, hitting power and clock targets (both very difficult goals), the headstart will be tangible. Note that by the end of 2012 we'll only just begin to see the first Cortex A15 implementations. ARMv8 based competitors will likey be a full year out, at least.
There's also the question of whether or not enterprise customers want to move to an ARM based server platform. Unlike in the smartphone/tablet space, x86 is the incumbent in the server arena. Equal performance at lower power consumption is quite attractive, but there's still a lot of convincing that needs to be done. Not to mention that Intel does have the ability to build a competitive, Atom based solution.
More than anything it's good to see such strong competition at both the high end and low end of the microprocessor business. Threatening to disrupt the status quo in both is going to pave the way for progress in our industry.
13 Comments
View All Comments
Andypro - Monday, November 14, 2011 - link
"APM's performance estimates put a 3GHz X-Gene at roughly half the integer performance of a 2.4GHz Sandy Bridge."Yet the chart at the top says X-Gene vastly outperforms Sandy Bridge per core in SpecINT? I can't read the tiny text in the image, but I must be missing something big.
MantasPakenas - Monday, November 14, 2011 - link
the tiny text reads the core count :) so at the top you have 32 cores, then 16 and so on... so if 32 core ARM outperforms 4 core sandy bridge 2x, then per core it still doesn't add up, but then I can't read everything that's written in tiny text either...Kristian Vättö - Monday, November 14, 2011 - link
Full size pic is here: http://images.anandtech.com/galleries/1532/Screen%...shiznit - Monday, November 14, 2011 - link
Since they are talking about single socket performance, I'm assuming they mean you can fit so many ARM cores in one socket that eventually you will pass a Sandy Bridge socket in perf/watt.Doormat - Monday, November 14, 2011 - link
Its chips like this that make the MacBook Air on ARM rumor seem like it can and will become a reality. It wont be soon, but by 2014-15, its almost a "why wouldn't they?" question, opposed to a why would they...shiznit - Monday, November 14, 2011 - link
As an owner of both a 2010 MBA and a 2011 and having seen the drop in battery life and the increase in heat in the new model, I am looking forward to a fast ARM SOC for laptops.dagamer34 - Monday, November 14, 2011 - link
Considering Apple bought 2 chip design companies in the past 3 years, I can only assume that they think they can design a better chip than Intel which is low power and has great battery life.And when you think about what people actually do on the web most of the time, browse the web, check e-mail, chat on IM, most of that stuff is not CPU-demanding. In fact, arguably, CPUs from 5 years ago would be fine for most day to day work. But getting more battery life out of our machines is really key.
eio - Monday, November 14, 2011 - link
Larabee in ARM flavor, on a new process nodedew111 - Monday, November 14, 2011 - link
Intel Atom and competitive in the same sentence? Really? I mean, if their development speeds up, Atom could be very competitive. But with current products they are quite a ways off versus AMD's Brazos, let alone ARM, for power/performance.hooflung - Monday, November 14, 2011 - link
This sounds exactly like something that is going to have a derivative in a future gaming console. Good buy red ring of death. Hello 16 core armv8 soc + hd6000 graphics under a single lid.