Bit flips: How cosmic rays grounded a fleet of aircraft

(bbc.com)

Comments

chris_va 20 hours ago
I highly recommend finding a cloud chamber (various science museums have them) to visualize just how much radiation is flying around.

Part of my work touches high power switches. I am going to do a bad job relating this story, but one of the power engineers was talking about how electric train switches in EU (Switzerland?) were having triggering issues. These were big MW scale IGBTs, not something you want to false trigger. Anyway, they eventually traced the problem to cosmic rays, and just turned the entire package vertical so the die was end-on to space (the mountains around were shielding the horizontal direction), and the problem went away.

RankingMember 23 hours ago
It's important to note that this is just Airbus's best guess as to the cause, as there's no smoking gun: they simply exhausted their troubleshooting and were left scratching their heads so this was the "least unlikely" cause they could come up with given the circumstances.
avazhi 23 hours ago
“ The increasing reliance of computers in fly-by-wire systems in aircraft, which use electronics rather than mechanical systems to control the plane in the air, also mean the risk posed by bit flips when they do occur is higher.”

Bit of an understatement. I don’t think there any active passenger airliners in the first world today that aren’t fly-by-wire. The MD-80 was the last of its kind and it’s been out of passenger operation for what, 10 years now?

charcircuit 19 hours ago
I feel like using "Cosmic Rays" as a reason is equivalent to "Aliens". It makes for good clickbait so everyone is fast to point at it as the reason even if there is no reason to actually believe that the bitflip was due to cosmic rays.
neko_ranger 23 hours ago
I swear to god I've been got by cosmic rays modifying a bit before when my boot order changes for random reasons
Borrible 15 hours ago
That reminds me of how the manufacturer's customer service department for my car some thirty years ago tried to convince me that the problems with the ignition electronics could also be caused by solar flares. Which could have been the case, of course, but then it would surely have affected other vehicle owners as well. Though, maybe the sun did shine just for me back then, you can never be sure, can't you. I briefly considered consulting an astronomer.
air7 8 hours ago
Naive question, but can't this be solved with device-level error correction?
who-shot-jr 17 hours ago
The Universe is Hostile to Computers - https://www.youtube.com/watch?v=AaZ_RSt0KP8
SwiftyBug 23 hours ago
I thought planes had insane redundancy exactly so stuff like that don´t happen. How can a bit flip cause the system that controls altitude to malfunction like that?
burnt-resistor 7 hours ago
PSA:

0. Always use a) SECDED hardware ECC and b) checksums on network links and I/O everywhere.

1. When unable to 1.a), add (72,64) 8-bits Hamming code per 64-bits (or) N>2 redundancy copies on physically-separate silicon for critical data and code. This is a significant performance hit, but safety is more important in some uses. (Don't neglect the integrity and reliability of code storage, loading, and execution paths either.)

2. Consider using Space Shuttle high-availability, high-reliability "voting" of N identically-designed behavior, possibly different manufacturer system control elements.

MarkusQ 23 hours ago
This is silly. Rapidly refreshing the data that was (presumably) flipped by a cosmic ray last time won't do anything to prevent an error in whatever it hits next time. Unless the theory is that cosmic rays are somehow more likely to hit these particular bits compared to all the millions (billions?) of others in the system...in which case I have a different objection.
preommr 22 hours ago
I had no idea this was a real thing - I always thought that xkcd comic[0] was just a random joke.

[0]https://xkcd.com/378/

jessriedel 23 hours ago
I thought some combination of error correction and redundant systems was already widespread in airplanes to prevent cosmic-ray induced errors. (GPT agrees.) What am I missing? I've read multiple articles on this, and none of them address the fact that the problem, at the level of detail described in the article, should have been prevented by technology available and widely deployed for decades.