It now appears this new mechanism may have issued some false alarms. We can't know for sure if these are false alarms or not but on average, most of them will be (I will explain this below). Here is a shot of the error message which says:m_pav recommended issuing a warning for the SMART attributes with 'Reallocated', 'Pending' or 'Uncorrect' in their names, that have a non-zero RAW VALUE.
Typically what has happened (as reported in the forums) is smartctl or the equivalent is run and it says the disk is fine.The disk you selected for installation appears to be failing, as the disk health indicators (S. M. A. R. T.) warning above indicates.
We recommend you abort the installation and have the disk checked and replaced.
I looked around for information on this and the two most useful things I found were the Wikipedia and this 13-page research paper from Google written in 2007. I don't claim these are the absolute best resources. Perhaps other people can find better ones.
Figure 7 on page 7 of the paper shows the difference in failure rate between drives that have zero reallocation errors and drives that have one or more. With no errors, the failure rate is always under 5%. If there are one or more errors, the failure rate varies between 7% and 20%. Certainly, there is a strong correlation between one more more errors in this particular parameter and a higher failure rate.
But it is possible our warning message is too dire. There are several reasons for this. The first problem could be due to the fact we are flagging parameters based on name and there could be parameters with matching names that don't correlate well with imminent failure. Next, even if we take the very worst case from Figure 7, and assume a non-zero value corresponds to a 20% failure rate then 80% of the time we issue a warning it will be a false alarm (in the upcoming year at least).
One problem I've seen is that some users don't realize that this error is associated with the entire drive (for example sda) and not a particular partition (sda1, sda2, etc). Confusion is compounded by the fact that the results of the test done by the installer do not correlate with the results of the smartmon tools the user is told to run. If the Google paper is the most definitive research on this subject then we might be able to fix things by simply changing our error message to better correspond to what the paper actually says. Perhaps something like:
One question we need to answer is what to suggest to people if they get this error in the installer but the smartmon tools say the drive is healthy. It is possible the false alarm rate will end up making this new feature more trouble than it is worth. We should certainly wait to hear from m_pav before jumping to this conclusion.The smartmon tools report that the disk you selected for installation might have a higher than average failure rate in the upcoming year.