Intel Chip Design Flaw Mitigation May Slow Many Enterprise Systems; Related Security Flaws Will Linger

This is the tech story of the decade. Intel's chips from the last *decade* have all had a security flaw that exposes protected kernel memory space to userland processes. Mitigation will start as soon as next week, but since the patches have to be in the OS, performance hits are rumored to be in the 5% to 30% range.

Chaos ensues and Intel is in for a big hit. What's your take on this? Any predictions?

I guess that’s what happens when there is little to no competition in the CPU space. I tend to take the extreme doom and gloom though with a grain of salt. I would guess in the end the performance hit won’t be that noticeable. Certainly not for gaming.

Can't wait for the lawsuit and my shiny payout.

You'll get a $20 coupon to be used on the next Intel processor.

Here's a really good first take on impacts from this. Best I've seen so far (but it's early days).

If Oracle had not just finished gutting SPARC/Solaris and its HW sales teams, this would have been a gold mine for them. Even moreso if they'd actually ported OL to SPARC as I and others advocated for, for years. #MissedOpportunity

Lone Sysadmin wrote:

All of our systems just got 30% more expensive. Put another way, we are all about to lose 5-30% of the systems we paid for, if they’re built on Intel hardware. That includes network switches, storage arrays, traditional servers, everything.

*gently pats Ryzen-build box sitting under the desk*

Yeah, your system is now worth more than when you bought it lol.

My prediction is that AMD is about to have a banner year.

Intel just posted this.

https://newsroom.intel.com/news/inte...

I'm off to the kitchen to make some popcorn....

Looks like the issue affects ARM CPUs as well:

http://lists.infradead.org/pipermail/linux-arm-kernel/2017-November/542751.html

It's definitely something to do with mixing kernel and userspace mappings in the TLB. All the fixes seem to either clear out the kernel half of the TLB when in userspace, or swap between two completely different sets of mappings whenever entering/exiting the kernel. So the more syscalls, the bigger the performance hit.

Robear wrote:

If Oracle had not just finished gutting SPARC/Solaris and its HW sales teams, this would have been a gold mine for them. Even moreso if they'd actually ported OL to SPARC as I and others advocated for, for years. #MissedOpportunity

Lone Sysadmin wrote:

All of our systems just got 30% more expensive. Put another way, we are all about to lose 5-30% of the systems we paid for, if they’re built on Intel hardware. That includes network switches, storage arrays, traditional servers, everything.

OL is RHEL with a sed /RedHat/OracleLinux/ run on it and an average delay of 2 weeks any major security patches.

We supported SPARC and Solaris for years, and it was absolute hell. The GNU toolchain is a circle of hell unto itself, but Sun everything was somehow worse.

Some new wrinkles:

NYT: Researchers Discover Two Major Flaws in the World’s Computers

There are two issues, which have been dubbed Meltdown and Spectre.

Meltdown is the one that affects recent Intel chips (since 2010) by exploiting out of order execution. Modern CPUs use speculative computing: when they're waiting for something but think they might need to run future operations, they can run the operations on idle execution units. If it turns out the CPU was mistaken about needing to run the operation, it just throws away the result. These operations skip the security check, because it'll get thrown away if it wasn't supposed to run. But, as it turns out, there is a way to exploit this to dump the entire kernel memory.

This is a particular problem for things like servers running multiple virtual machines, where one VM can use this to look at the other VMs on the same server.

Spectre exploits similar flaws with speculative execution. It takes a bit more work, from what I can see--but it is a flaw in Intel, AMD, and ARM chips. This includes using JavaScript code to read data from the address space of the browser process running it. The KAISER patch to fix Meltdown does not protect against Spectre.

Um, the patch breaks some antivirus software (to the point that your computer won't boot) so you might need to update your AV before you can patch:

Important information regarding the Windows security updates released on January 3, 2018 and anti-virus software

Microsoft wrote:

Overview
Microsoft has identified a compatibility issue with a small number of anti-virus software products.

The compatibility issue is caused when anti-virus applications make unsupported calls into Windows kernel memory. These calls may cause stop errors (also known as blue screen errors) that make the device unable to boot. To help prevent stop errors caused by incompatible anti-virus applications, Microsoft is only offering the Windows security updates released on January 3, 2018 to devices running anti-virus software from partners who have confirmed their software is compatible with the January 2018 Windows operating system security update.

If you have not been offered the security update, you may be running incompatible anti-virus software and you should follow up with your software vendor.

Microsoft has been working closely with anti-virus software partners to ensure all customers receive the January Windows security updates as soon as possible.

Cube wrote:

We supported SPARC and Solaris for years, and it was absolute hell. The GNU toolchain is a circle of hell unto itself, but Sun everything was somehow worse.

I worked with and for Sun Micro, and then Oracle hardware side, since the 80's, and you're literally the first person I've had say this. I've had customers upset with one thing or another, of course, but "absolute hell"? First for me. (Expressing surprise here, not disbelief, I'm sure you have good reasons for that.)

I can count the number of customers I had who are happier after leaving Sun behind on one hand with fingers left over. Most of them wish they didn't have to. But to each his own.

(I no longer work for Oracle.)

Gremlin wrote:

Um, the patch breaks some antivirus software (to the point that your computer won't boot) so you might need to update your AV before you can patch:

I assume Avira is one of them, or perhaps Malwarebytes, because I have not to my knowledge been offered the security update. Anyone have an actual list?

Robear wrote:
Gremlin wrote:

Um, the patch breaks some antivirus software (to the point that your computer won't boot) so you might need to update your AV before you can patch:

I assume Avira is one of them, or perhaps Malwarebytes, because I have not to my knowledge been offered the security update. Anyone have an actual list?

I've been looking but haven't found one yet. Microsoft describes a way in that article for you to check your registry to see if your AV vendor has flipped that setting on.

That setting is not for Win10, however.

Title is misleading. This is not just an Intel problem, though I appreciate the desire to get nostalgic about bad SPARCs along the way. Meltdown is the flashy but not-that-important one here; the perf hit is noticeable but whatevery unless you are using trap-heavy workloads (small IO in a tight loop) and folks who trust their stuff can disable PTI for current throughput. Spectre is potentially extremely dangerous.

Spectre is probably practicable against anybody who's practicing speculative or possibly even just out-of-order execution (ARM has said that the M3 is not vulnerable, for example, but has said nothing about A-series chips and has submitted patches to LKML).

I'm wondering if MIPS64 is also vulnerable but have not seen any testing.

The performance issue seems to be related to the Intel processors, but other two problems are fixed with software changes that don't seem to have performance issues attached to the fix. But I'll update the title as the evidence warrants, although my primary concern is performance hits caused by mitigation.

Your primary concern is not that somebody can write a sidechannel information leak against ssh-agent or your password manager as browser-executed JavaScript? But rather that kernel traps are a little more expensive and that `lseek` in a tight loop might be 30% slower?

I think you might be focusing on the wrong thing.

Is there a Spectre based proof of concept out there yet?

Yes, though verifying it is somewhat out of my league.

https://packetstormsecurity.com/file...

TheGameguru wrote:

Is there a Spectre based proof of concept out there yet?

https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

So far, there are three known variants of the issue:

Variant 1: bounds check bypass (CVE-2017-5753)
Variant 2: branch target injection (CVE-2017-5715)
Variant 3: rogue data cache load (CVE-2017-5754)

Before the issues described here were publicly disclosed, Daniel Gruss, Moritz Lipp, Yuval Yarom, Paul Kocher, Daniel Genkin, Michael Schwarz, Mike Hamburg, Stefan Mangard, Thomas Prescher and Werner Haas also reported them; their [writeups/blogposts/paper drafts] are at:

Spectre (variants 1 and 2)
Meltdown (variant 3)

During the course of our research, we developed the following proofs of concept (PoCs):

1. A PoC that demonstrates the basic principles behind variant 1 in userspace on the tested Intel Haswell Xeon CPU, the AMD FX CPU, the AMD PRO CPU and an ARM Cortex A57 [2]. This PoC only tests for the ability to read data inside mis-speculated execution within the same process, without crossing any privilege boundaries.
2. A PoC for variant 1 that, when running with normal user privileges under a modern Linux kernel with a distro-standard config, can perform arbitrary reads in a 4GiB range [3] in kernel virtual memory on the Intel Haswell Xeon CPU. If the kernel's BPF JIT is enabled (non-default configuration), it also works on the AMD PRO CPU. On the Intel Haswell Xeon CPU, kernel virtual memory can be read at a rate of around 2000 bytes per second after around 4 seconds of startup time. [4]
3. A PoC for variant 2 that, when running with root privileges inside a KVM guest created using virt-manager on the Intel Haswell Xeon CPU, with a specific (now outdated) version of Debian's distro kernel [5] running on the host, can read host kernel memory at a rate of around 1500 bytes/second, with room for optimization. Before the attack can be performed, some initialization has to be performed that takes roughly between 10 and 30 minutes for a machine with 64GiB of RAM; the needed time should scale roughly linearly with the amount of host RAM. (If 2MB hugepages are available to the guest, the initialization should be much faster, but that hasn't been tested.)
4. A PoC for variant 3 that, when running with normal user privileges, can read kernel memory on the Intel Haswell Xeon CPU under some precondition. We believe that this precondition is that the targeted kernel memory is present in the L1D cache.

Ed Ropple wrote:

Your primary concern is not that somebody can write a sidechannel information leak against ssh-agent or your password manager as browser-executed JavaScript? But rather that kernel traps are a little more expensive and that `lseek` in a tight loop might be 30% slower?

Yeah, because the security fix is imminent, but the performance hit will affect my customers until their next upgrade cycle (and affected Cloud systems could be even more interesting). Given that I advise people on what hardware they need to do a certain job, and I don't want to sell them what they don't need, for me the long-term concern is the important one.

The "security fix" is not imminent. Spectre's fault is endemic to the modern out-of-order processor. Patches to provide symptomatic relief with regards to what we know can be done, right now, are imminent. On very modern operating systems, and few older ones. This is the short and long-term concern that you and everyone else should be worried about; whether you're selling them one hundred servers or one hundred and five (because in the real world that's about what we're looking at in even heavy kernel-trap cases) is so tremendously less important than the incipient reality of browser-vector drive-by memory sucks that it beggars belief.

I can do something about increasing compute capacity for customers that need it. What exactly do you propose that I do about Spectre, as a computer reseller?

Robear wrote:
Ed Ropple wrote:

Your primary concern is not that somebody can write a sidechannel information leak against ssh-agent or your password manager as browser-executed JavaScript? But rather that kernel traps are a little more expensive and that `lseek` in a tight loop might be 30% slower?

Yeah, because the security fix is imminent, but the performance hit will affect my customers until their next upgrade cycle (and affected Cloud systems could be even more interesting). Given that I advise people on what hardware they need to do a certain job, and I don't want to sell them what they don't need, for me the long-term concern is the important one.

Fair, though there isn't a fix available for Spectre yet, as I understand it.

What really worries me, now that I've had time to think about it, is all of the embedded devices, appliances, and the internet of things. Like, you've patched Windows, but have you updated your wi-fi router? (Granted, it's a bit harder to convince a router to run arbitrary JavaScript code.) What OS is that digital thermostat running? Etc.

I'm being a little overly apocalyptic, since from my reading this specific kind of attack on speculative processing requires convincing a device to run the attack code in the first place. Which is rare in things without web browsers. But there are billions of chips out there with this flaw, and they're in a heck of a lot of devices. (And a lot of them have web browsers. Or web servers embedded in the CPU.)

The Internet of Things is such a slow motion security dumpster fire that this is just one more log on the blaze.

gravity wrote:

The Internet of Things is such a slow motion security dumpster fire that this is just one more log on the blaze.

True.

Remember, too, we're all learning about this incrementally. I was under the impression that the patches were imminent (and having some in commit for the Linux kernel helped that). I also have a particular focus because of my job, and that's not an aberration, that's normal. So please bear with me and understand that it will take time for the entire impact of this to become clear.

Robear wrote:

Remember, too, we're all learning about this incrementally. I was under the impression that the patches were imminent (and having some in commit for the Linux kernel helped that). I also have a particular focus because of my job, and that's not an aberration, that's normal. So please bear with me and understand that it will take time for the entire impact of this to become clear.

Quite. I think the rest of us are mostly deciding whether to run around screaming that the sky is falling or looking for where we left the box of popcorn, depending.