Hidden data errors
-
In this post, they explain what is being done via software to detect errors in the hardware that were not detected during the tests in the manufacturing process.
-
Silent data errors (SDEs) is a little-known phenomenon that is being investigated at the level of computer/electronic engineering and mainly in datacenters.
This is only relevant at the level of very complex CPUs and in the computational densities of data centers, it is not at all something new that we should worry about.
It turns out that the reduction in the manufacturing size of silicons also makes the precision and accuracy of their production more difficult, and the market requires mass production.We are talking about the latest frontier in high-level research on fidelity in computer systems, so don't worry.
Experts opine on how current micros are at the limit of everything, but if you don't have a Meta or a Google don't worry.
Salu2.
-
@defaultuser Well, yes, it is'silent'. I don't know why I translated it as 'hidden'.
I had never read about this topic. There is a lot written about failure rates in RAM and in fact there are hardware implementations. I don't know if this type of error is attributable to manufacturing and is comparable to this (I suppose not because of the'simplicity' of a RAM and the complexity of a processor).
The fact that they are trying to attack the problem of CPUs via software already gives an idea of how slippery the topic must be.
-
@cobito In the ram it's not about manufacturing, what happens is that the "units" where the bits are stored are basically capacitors, and apparently they tend to lose a little of their energy level when they are activated, hence the ram self-refreshes.
Already having ECC, it seems that outside of servers or workstations it's not worth it, it will pass like SDEs I suppose.@cobito said in Hidden data errors:
The fact that they try to attack the problem of CPUs via software already gives an idea of how slippery the topic must be.
It goes unnoticed to absolutely everything except for specific analysis tools, imagine, maybe this phenomenon was discovered by theory and later verification? who knows.
The case is that it seems to be starting to have an incidence that is worth taking more seriously, between the smaller integration giving less precision of the finish, and the computer plant not stopping growing, it makes sense that at a certain point the control of this phenomenon is crucial.they mention a tool, the Fire Tool, I haven't seen it but it's sure to be a mess.