Intel Addresses Desktop Raptor Lake Instability Issues: Faults Excessive Voltage from Microcode, Fix Coming in August
by Ryan Smith on July 22, 2024 7:00 PM EST- Posted in
- CPUs
- Intel
- 13th Gen Core
- Raptor Lake
- 14th Gen Core
In what started last year as a handful of reports about instability with Intel's Raptor Lake desktop chips has, over the last several months, grown into a much larger saga. Facing their biggest client chip instability impediment in decades, Intel has been under increasing pressure to figure out the root cause of the issue and fix it, as claims of damaged chips have stacked up and rumors have swirled amidst the silence from Intel. But, at long last, it looks like Intel's latest saga is about to reach its end, as today the company has announced that they've found the cause of the issue, and will be rolling out a microcode fix next month to resolve it.
Officially, Intel has been working to identify the cause of desktop Raptor Lake’s instability issues since at least February of this year, if not sooner. In the interim they have discovered a couple of correlating factors – telling motherboard vendors to stop using ridiculous power settings for their out-of-the-box configurations, and finding a voltage-related bug in Enhanced Thermal Velocity Boost (eTVB) – but neither factor was the smoking gun that set all of this into motion. All of which had left Intel to continue searching for the root cause in private, and lots of awkward silence to fill the gaps in the public.
But it looks like Intel’s search has finally come to an end – even if Intel isn’t putting the smoking gun on public display quite yet. According to a fresh update posted to the company’s community website, Intel has determined the root cause at last, and has a fix in the works.
Per the company’s announcement, Intel has tracked down the cause of the instability issue to “elevated operating voltages”, that at its heart, stems from a flawed algorithm in Intel’s microcode that requested the wrong voltage. Consequently, Intel will be able to resolve the issue through a new microcode update, which pending validation, is expected to be released in the middle of August.
Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages. We are continuing validation to ensure that scenarios of instability reported to Intel regarding its Core 13th/14th Gen desktop processors are addressed. Intel is currently targeting mid-August for patch release to partners following full validation.
Intel is committed to making this right with our customers, and we continue asking any customers currently experiencing instability issues on their Intel Core 13th/14th Gen desktop processors reach out to Intel Customer Support for further assistance.
-Intel Community Post
And while there’s nothing good for Intel about Raptor Lake’s instability issues or the need to fix them, that the problem can be ascribed to (or at least fixed by) microcode is about the best possible outcome the company could hope for. Across the full spectrum of potential causes, microcode is the easiest to fix at scale – microcode updates are already distributed through OS updates, and all chips of a given stepping (millions in all) run the same microcode. Even a motherboard BIOS-related issue would be much harder to fix given the vast number of different boards out there, never mind a true hardware flaw that would require Intel to replace even more chips than they already have.
Still, we’d also be remiss if we didn’t note that microcode is regularly used to paper over issues further down in the processor, as we’ve most famously seen with the Meltdown/Spectre fixes several years ago. So while Intel is publicly attributing the issue to microcode bugs, there are several more layers to the onion that is modern CPUs that could be playing a part. In that respect, a microcode fix grants the least amount of insight into the bug and the performance implications about its fix, since microcode can be used to mitigate so many different issues.
But for now, Intel’s focus is on communicating that they have fix and establishing a timeline for distributing it. The matter has certainly caused them a lot of consternation over the last year, and it will continue to do so for at least another month.
In the meantime, we’ve reached out to our Intel contacts to see if the company will be publishing additional details about the voltage bug and its fix. “Elevated operating voltages” is not a very satisfying answer on its own, and given the unprecedented nature of the issue, we’re hoping that Intel will be able to share additional details as to what’s going on, and how Intel will be preventing it in the future.
Intel Also Confirms a Via Oxidation Manufacturing Issue Affected Early Raptor Lake Chips
Tangential to this news, Intel has also made a couple of other statements regarding chip instability to the press and public over the last 48 hours that also warrant some attention.
First and foremost, leading up to Intel’s official root cause analysis of the desktop Raptor Lake instability issues, one possibility that couldn’t be written off at the time was that the root cause of the issue was a hardware flaw of some kind. And while the answer to that turned out to be “no,” there is a rather important “but” in there, as well.
As it turns out, Intel did have an early manufacturing flaw in the enhanced version of the Intel 7 process node used to build Raptor Lake. According to a post made by Intel to Reddit this afternoon, a “via Oxidation manufacturing issue” was addressed in 2023. However, despite the suspicious timing, according to Intel this is separate from the microcode issue driving instability issues with Raptor Lake desktop processors up to today.
Long answer: We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue.
For the Instability issue, we are delivering a microcode patch which addresses exposure to elevated voltages which is a key element of the Instability issue. We are currently validating the microcode patch to ensure the instability issues for 13th/14th Gen are addressed.
-Intel Reddit Post
Ultimately, Intel says that they caught the issue early-on, and that only a small number of Raptor Lake were affected by the via oxidation manufacturing flaw. Which is hardly going to come as a comfort to Raptor Lake owners who are already worried about the instability issue, but if nothing else, it’s helpful that the issue is being publicly documented. Typically, these sorts of early teething issues go unmentioned, as even in the best of scenarios, some chips inevitably fail prematurely.
Unfortunately, Intel’s revelation here doesn’t offer any further details on what the issue is, or how it manifests itself beyond further instability. Though at the end of the day, as with the microcode voltage issue, the fix for any affected chips will be to RMA them with Intel to get a replacement.
Laptops Not Affected by Raptor Lake Microcode Issue
Finally, ahead of the previous two statements, Intel also released a statement to Digital Trends and a few other tech websites over the weekend, in response to accusations that Intel’s 13th generation Core mobile CPUs were also impacted by what we now know to be the microcode flaw. In the statement, Intel refuted those claims, stating that laptop chips were not suffering from the same instability issue.
-Intel Rep to Digital Trends
Instead, Intel attributed any laptop instability issues to typical hardware and software issues – essentially claiming that they weren’t experiencing elevated instability issues. Whether this statement accounts for the via oxidation manufacturing issue is unclear (in large part because not all 13th Gen Core Mobile parts are Raptor Lake), but this is consistent with Intel’s statements from earlier this year, which have always explicitly cited the instability issues as desktop issues.
49 Comments
View All Comments
ballsystemlord - Monday, July 22, 2024 - link
So that makes 4 different things that were causing instability on Intel HW. 1st the MB settings, 2nd the microcode requesting incorrect voltages, 3rd the oxidation, and 4th SW/HW problems in laptops.I am imagining just how hard this was for Intel's engineering team to troubleshoot. It must have been a nightmare.
Assuming the problem is, through the forthcoming microcode update, resolved, I'd like to congratulate them on a job well done.
fallaha56 - Tuesday, July 23, 2024 - link
Errr…the voltage problem has likely caused degradation issues ie chips already malfunctioning will not be fixed by thisAnd of course you’ll notice Intels new RL stepping planned for release
Time overdue for Intel to come clean about this
Oxford Guy - Tuesday, July 23, 2024 - link
The assertion is that situation went from many years without problems to having 4 separate simultaneous problems?ballsystemlord - Tuesday, July 23, 2024 - link
I don't pretend to know everything, but having followed Intel and AMD CPUs since before Zen, and the only HW longevity problems I've heard of are those mentioned above.Oxford Guy - Friday, July 26, 2024 - link
I was just highlighting the situation.name99 - Tuesday, July 23, 2024 - link
Except how much of the above is the FULL truth?Intel can afford to admit to microcode bugs because they have had those before.
What they cannot afford to admit to is "we pushed 5 nodes in 4 years too fast"...
Look at what they say:
"
Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor.
"
That would be legally defensible as "true" even if the basic problem were "Intel Foundry gave us the wrong numbers to plug into our Turbo microcode"...
Obviously I have no insider knowledge of Intel. Maybe the problem is 100% incompetence in designing the "algorithm" for Turbo control of these chips. But the way I'd bet is that this was a manufacturing issue (in the sense that manufacturing was rushed out too fast, without enough time to fully characterize how the devices behave under long-term stress conditions), but admitting it as a Foundry issue rather than as a microcode issue has a much larger business impact, so...
mode_13h - Tuesday, July 23, 2024 - link
Well said.A similar thought I've had is that the fault might be process-related, but the only thing they can feasibly do about it is tweaking microcode to better cater to the chip's true limitations.
mode_13h - Tuesday, July 23, 2024 - link
I should add that the timing seems highly suspicious. It's as if they want to be sure the fix lands *after* all the reviews of Ryzen 9000 get published, so that reviewers will still compare against the unpatched Raptor Lakes, not hampered by any performance-robbing mitigations resulting from the fix.kkilobyte - Thursday, July 25, 2024 - link
I'm actually curious to see how various review websites are going to react, now that the new Ryzens were slightly delayed. I too found the timing quite suspicious.Oxford Guy - Tuesday, July 30, 2024 - link
Andy Edser: 'No recall, no halting of sales, and no comment on warranty extensions. … the damage to affected chips may be permanent'