How to interpret this smartctl (smartmon) dataHow can I sort du -h output by sizeSkipping scheduled self-tests and predicting drive EOLUnderstanding smartctl -a outputsmartctl -A Missing Attributessmartctl or hddtemp for xvdaDegradedArray event after rsync but later mdadm and smartctl do not show any issueHow increase write speed of raid1 mdadm?smartctl drive Media_Wearout_Indicator values outside normal boundssmartctl 6.6 missing attributes tableHow to collect historical data using smartctl?
How do I spend money in the US?
Is there an evolutionary advantage to having two heads?
Is it possible to change original filename of an exe?
How to capture more stars?
What are the problems in teaching guitar via Skype?
Why is A union B also called "A or B"?
Is floating in space similar to falling under gravity?
Socratic Paradox
What is the difference between nullifying your vote and not going to vote at all?
Is there an explanation for Austria's Freedom Party virtually retaining its vote share despite recent scandal?
Thousands and thousands of words
What was this black-and-white film set in the Arctic or Antarctic where the monster/alien gets fried in the end?
Can a non-EU citizen travel within the Schengen area without identity documents?
Where did the “vikings wear helmets with horn” stereotype come from and why?
Biblical Basis for 400 years of silence between old and new testament
What does "Marchentalender" on the front of a postcard mean?
Differences between “pas vrai ?”, “c’est ça ?”, “hein ?”, and “n’est-ce pas ?”
What caused the tendency for conservatives to not support climate change regulations?
Different PCB color ( is it different material? )
Mapping a function f[xi_,xj_] over a list x1, ...., xn with the i < j restriction
Yandex programming contest: Alarms
Looking after a wayward brother in mother's will
Why does the UK have more political parties than the US?
Can I install a row of bricks on a slab to support a shed?
How to interpret this smartctl (smartmon) data
How can I sort du -h output by sizeSkipping scheduled self-tests and predicting drive EOLUnderstanding smartctl -a outputsmartctl -A Missing Attributessmartctl or hddtemp for xvdaDegradedArray event after rsync but later mdadm and smartctl do not show any issueHow increase write speed of raid1 mdadm?smartctl drive Media_Wearout_Indicator values outside normal boundssmartctl 6.6 missing attributes tableHow to collect historical data using smartctl?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
We have a linux server that has been in heavy use for 3 years. We're running a number of virtualized servers on it, some that have not been well behaved, and for a significant time the server's io capacity was exceeded leading to bad iowait. It's got 4 500gb Barracuda sata drives connected to a 3com raid controller. 1 Drive has the OS, and the other 3 are setup raid-5.
Now we have a debate as to the condition of the drives and whether they are actively failing.
Here's a portion of the output for 1 of the 4 disks. They all have relatively similar statistics:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 169074425
3 Spin_Up_Time 0x0003 095 092 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 26
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 200009354607
9 Power_On_Hours 0x0032 069 069 000 Old_age Always - 27856
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 1
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 26
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 1
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 071 060 045 Old_age Always - 29 (Lifetime Min/Max 26/37)
194 Temperature_Celsius 0x0022 029 040 000 Old_age Always - 29 (0 21 0 0)
195 Hardware_ECC_Recovered 0x001a 046 033 000 Old_age Always - 169074425
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
My interpretation of this is that we have not had any bad sectors or other indications that any of the drives are actively failing.
However, the high Raw_Read_Error_Rate and Seek_Error_Rate is being pointed to as indications that the drives are dying.
linux smartctl
add a comment |
We have a linux server that has been in heavy use for 3 years. We're running a number of virtualized servers on it, some that have not been well behaved, and for a significant time the server's io capacity was exceeded leading to bad iowait. It's got 4 500gb Barracuda sata drives connected to a 3com raid controller. 1 Drive has the OS, and the other 3 are setup raid-5.
Now we have a debate as to the condition of the drives and whether they are actively failing.
Here's a portion of the output for 1 of the 4 disks. They all have relatively similar statistics:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 169074425
3 Spin_Up_Time 0x0003 095 092 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 26
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 200009354607
9 Power_On_Hours 0x0032 069 069 000 Old_age Always - 27856
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 1
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 26
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 1
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 071 060 045 Old_age Always - 29 (Lifetime Min/Max 26/37)
194 Temperature_Celsius 0x0022 029 040 000 Old_age Always - 29 (0 21 0 0)
195 Hardware_ECC_Recovered 0x001a 046 033 000 Old_age Always - 169074425
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
My interpretation of this is that we have not had any bad sectors or other indications that any of the drives are actively failing.
However, the high Raw_Read_Error_Rate and Seek_Error_Rate is being pointed to as indications that the drives are dying.
linux smartctl
1
There is a good description here (too long to repost, please follow the link): lime-technology.com/wiki/Understanding_SMART_Reports In case the link goes down, some important quotes: "This is an indicator of the current rate of errors of the low level physical sector read operations. In normal operation, there are ALWAYS a small number of errors [...] there is NO issue with the drive." and "PLEASE completely ignore the RAW_VALUE number! Only Seagates report the raw value, which yes, does appear to be the number of raw read errors, but should be ignored, completely."
– Konrad Gajewski
Feb 12 '18 at 21:19
add a comment |
We have a linux server that has been in heavy use for 3 years. We're running a number of virtualized servers on it, some that have not been well behaved, and for a significant time the server's io capacity was exceeded leading to bad iowait. It's got 4 500gb Barracuda sata drives connected to a 3com raid controller. 1 Drive has the OS, and the other 3 are setup raid-5.
Now we have a debate as to the condition of the drives and whether they are actively failing.
Here's a portion of the output for 1 of the 4 disks. They all have relatively similar statistics:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 169074425
3 Spin_Up_Time 0x0003 095 092 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 26
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 200009354607
9 Power_On_Hours 0x0032 069 069 000 Old_age Always - 27856
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 1
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 26
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 1
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 071 060 045 Old_age Always - 29 (Lifetime Min/Max 26/37)
194 Temperature_Celsius 0x0022 029 040 000 Old_age Always - 29 (0 21 0 0)
195 Hardware_ECC_Recovered 0x001a 046 033 000 Old_age Always - 169074425
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
My interpretation of this is that we have not had any bad sectors or other indications that any of the drives are actively failing.
However, the high Raw_Read_Error_Rate and Seek_Error_Rate is being pointed to as indications that the drives are dying.
linux smartctl
We have a linux server that has been in heavy use for 3 years. We're running a number of virtualized servers on it, some that have not been well behaved, and for a significant time the server's io capacity was exceeded leading to bad iowait. It's got 4 500gb Barracuda sata drives connected to a 3com raid controller. 1 Drive has the OS, and the other 3 are setup raid-5.
Now we have a debate as to the condition of the drives and whether they are actively failing.
Here's a portion of the output for 1 of the 4 disks. They all have relatively similar statistics:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 169074425
3 Spin_Up_Time 0x0003 095 092 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 26
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 200009354607
9 Power_On_Hours 0x0032 069 069 000 Old_age Always - 27856
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 1
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 26
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 1
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 071 060 045 Old_age Always - 29 (Lifetime Min/Max 26/37)
194 Temperature_Celsius 0x0022 029 040 000 Old_age Always - 29 (0 21 0 0)
195 Hardware_ECC_Recovered 0x001a 046 033 000 Old_age Always - 169074425
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
My interpretation of this is that we have not had any bad sectors or other indications that any of the drives are actively failing.
However, the high Raw_Read_Error_Rate and Seek_Error_Rate is being pointed to as indications that the drives are dying.
linux smartctl
linux smartctl
asked Sep 20 '11 at 21:28
gviewgview
7681716
7681716
1
There is a good description here (too long to repost, please follow the link): lime-technology.com/wiki/Understanding_SMART_Reports In case the link goes down, some important quotes: "This is an indicator of the current rate of errors of the low level physical sector read operations. In normal operation, there are ALWAYS a small number of errors [...] there is NO issue with the drive." and "PLEASE completely ignore the RAW_VALUE number! Only Seagates report the raw value, which yes, does appear to be the number of raw read errors, but should be ignored, completely."
– Konrad Gajewski
Feb 12 '18 at 21:19
add a comment |
1
There is a good description here (too long to repost, please follow the link): lime-technology.com/wiki/Understanding_SMART_Reports In case the link goes down, some important quotes: "This is an indicator of the current rate of errors of the low level physical sector read operations. In normal operation, there are ALWAYS a small number of errors [...] there is NO issue with the drive." and "PLEASE completely ignore the RAW_VALUE number! Only Seagates report the raw value, which yes, does appear to be the number of raw read errors, but should be ignored, completely."
– Konrad Gajewski
Feb 12 '18 at 21:19
1
1
There is a good description here (too long to repost, please follow the link): lime-technology.com/wiki/Understanding_SMART_Reports In case the link goes down, some important quotes: "This is an indicator of the current rate of errors of the low level physical sector read operations. In normal operation, there are ALWAYS a small number of errors [...] there is NO issue with the drive." and "PLEASE completely ignore the RAW_VALUE number! Only Seagates report the raw value, which yes, does appear to be the number of raw read errors, but should be ignored, completely."
– Konrad Gajewski
Feb 12 '18 at 21:19
There is a good description here (too long to repost, please follow the link): lime-technology.com/wiki/Understanding_SMART_Reports In case the link goes down, some important quotes: "This is an indicator of the current rate of errors of the low level physical sector read operations. In normal operation, there are ALWAYS a small number of errors [...] there is NO issue with the drive." and "PLEASE completely ignore the RAW_VALUE number! Only Seagates report the raw value, which yes, does appear to be the number of raw read errors, but should be ignored, completely."
– Konrad Gajewski
Feb 12 '18 at 21:19
add a comment |
6 Answers
6
active
oldest
votes
In my experience, Seagates have weird numbers for those two SMART attributes. When diagnosing a Seagate I tend to ignore those and look more closely at other fields like Reallocated Sector Count. Of course, when in doubt replace the drive, but even brand new Seagates will have high numbers for those attributes.
add a comment |
For Seagate disks (and possibly some old ones from WD too) the Seek_Error_Rate and Raw_Read_Error_Rate are 48 bit numbers, where the most significant 16 bits are an error count, and the low 32 bits are a number of operations.
% python
>>> 200009354607 & 0xFFFFFFFF
2440858991
>>> (200009354607 & 0xFFFF00000000) >> 32
46
So your disk has performed 2440858991 seeks, of which 46 failed. My experience with Seagate drives is that they tend to fail when the number of errors goes over 1000. YMMV.
5
Thans for this, I wish I had that information back when I originally posed the question.
– gview
Jan 31 '14 at 17:55
This, very useful. Saved me from panic.
– Halsafar
Nov 14 '18 at 23:11
add a comment |
The "seek error rate" and "raw read error rate" RAW_VALUES are virtually meaningless for anyone but Seagate's support. As others pointed out, raw values of parameters like "reallocated sector count" or entries in the drive's error log are more likely to indicate a higher probability of failure.
But you can take a look at the interpreted data in the VALUE, WORST and THRESH columns which are meant to be read as gauges:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH
7 Seek_Error_Rate 0x000f 077 060 030
Meaning that your seek error rate is currently considered to be "77% good" and is reported as a problem by SMART when it reaches "30% good". It had been as low as "60% good" once, but has magically recovered since. Note that the interpreted values are calculated by the drive's SMART logic internally and the exact calculation may or may not be published by the manufacturer and typically cannot be tweaked by the user.
Personally, I consider a drive containing error log entries as "failing" and urge for a replacement as soon as they occur. But all in all, SMART data has turned out to be a rather weak indicator for failure prediction, as a research paper published by Google uncovered.
add a comment |
I realized this discussion is a bit old but want to add my 2 cents. I have found the smart information to be quite a good indicator of pre-fail. When you get a smart threshold tripped then replace the drive. That is what those thresholds are for.
The vast majority of time you will start to see bad sectors. That is a sure sign the drive is starting to fail. SMART has saved me many times. I use software RAID 1 and it's very helpful since you simply replace the failing drive and rebuild the array.
I also run short and long self test weekly.
smartctl -t short /dev/sda
smartctl -t long /dev/sda
Or add it /etc/smartd.conf and get it to email you if there are errors
/dev/sda -s L/../../3/22 -I 194 -m someemail@somedomain
/dev/sdb -s L/../../7/22 -I 194 -m someemail@somedomain
Make sure to install logwatch and redirect root to an email address and check the daily emails from logwatch. SMARTD tripped flags will show up there but it's of no help if nobody is monitoring that regularly.
add a comment |
Yes, those fields look bad but I don't trust (anymore) the info reported by smart (my test machine have a drive which should be dead a long time ago if you read the data with smartctrl)
The fact is that you have reported high iowait and the drives are 3 years old. This should be enough for you to change the drives.
1
For various reasons we need to maximize our investment in the hardware. The iowait had to do with the ridiculous load, as well as some configuration mistakes we made when setting up the box.
– gview
Sep 20 '11 at 22:57
add a comment |
Sorry to commit necromancy on this post, but in my experience, the "Raw Read Error Rate" and "Hardware ECC Recovered" fields for a Seagate drive will quite literally go all over the place and increment constantly into the trillions range at which point they'll cycle back around to zero to continue the process again. I've a Seagate ST9750420AS that has had that problem since day one and still works great even after quite a few years and 3500+ hours of use.
I think those fields can be safely ignored if you're running one in your case. Just make sure the two fields are reporting the same number and in sync constantly. If they're not...well... That actually might mean a problem.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f313649%2fhow-to-interpret-this-smartctl-smartmon-data%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
In my experience, Seagates have weird numbers for those two SMART attributes. When diagnosing a Seagate I tend to ignore those and look more closely at other fields like Reallocated Sector Count. Of course, when in doubt replace the drive, but even brand new Seagates will have high numbers for those attributes.
add a comment |
In my experience, Seagates have weird numbers for those two SMART attributes. When diagnosing a Seagate I tend to ignore those and look more closely at other fields like Reallocated Sector Count. Of course, when in doubt replace the drive, but even brand new Seagates will have high numbers for those attributes.
add a comment |
In my experience, Seagates have weird numbers for those two SMART attributes. When diagnosing a Seagate I tend to ignore those and look more closely at other fields like Reallocated Sector Count. Of course, when in doubt replace the drive, but even brand new Seagates will have high numbers for those attributes.
In my experience, Seagates have weird numbers for those two SMART attributes. When diagnosing a Seagate I tend to ignore those and look more closely at other fields like Reallocated Sector Count. Of course, when in doubt replace the drive, but even brand new Seagates will have high numbers for those attributes.
answered Sep 20 '11 at 22:38
hwilbankshwilbanks
42623
42623
add a comment |
add a comment |
For Seagate disks (and possibly some old ones from WD too) the Seek_Error_Rate and Raw_Read_Error_Rate are 48 bit numbers, where the most significant 16 bits are an error count, and the low 32 bits are a number of operations.
% python
>>> 200009354607 & 0xFFFFFFFF
2440858991
>>> (200009354607 & 0xFFFF00000000) >> 32
46
So your disk has performed 2440858991 seeks, of which 46 failed. My experience with Seagate drives is that they tend to fail when the number of errors goes over 1000. YMMV.
5
Thans for this, I wish I had that information back when I originally posed the question.
– gview
Jan 31 '14 at 17:55
This, very useful. Saved me from panic.
– Halsafar
Nov 14 '18 at 23:11
add a comment |
For Seagate disks (and possibly some old ones from WD too) the Seek_Error_Rate and Raw_Read_Error_Rate are 48 bit numbers, where the most significant 16 bits are an error count, and the low 32 bits are a number of operations.
% python
>>> 200009354607 & 0xFFFFFFFF
2440858991
>>> (200009354607 & 0xFFFF00000000) >> 32
46
So your disk has performed 2440858991 seeks, of which 46 failed. My experience with Seagate drives is that they tend to fail when the number of errors goes over 1000. YMMV.
5
Thans for this, I wish I had that information back when I originally posed the question.
– gview
Jan 31 '14 at 17:55
This, very useful. Saved me from panic.
– Halsafar
Nov 14 '18 at 23:11
add a comment |
For Seagate disks (and possibly some old ones from WD too) the Seek_Error_Rate and Raw_Read_Error_Rate are 48 bit numbers, where the most significant 16 bits are an error count, and the low 32 bits are a number of operations.
% python
>>> 200009354607 & 0xFFFFFFFF
2440858991
>>> (200009354607 & 0xFFFF00000000) >> 32
46
So your disk has performed 2440858991 seeks, of which 46 failed. My experience with Seagate drives is that they tend to fail when the number of errors goes over 1000. YMMV.
For Seagate disks (and possibly some old ones from WD too) the Seek_Error_Rate and Raw_Read_Error_Rate are 48 bit numbers, where the most significant 16 bits are an error count, and the low 32 bits are a number of operations.
% python
>>> 200009354607 & 0xFFFFFFFF
2440858991
>>> (200009354607 & 0xFFFF00000000) >> 32
46
So your disk has performed 2440858991 seeks, of which 46 failed. My experience with Seagate drives is that they tend to fail when the number of errors goes over 1000. YMMV.
edited Aug 20 '15 at 1:25
Dan Pritts
2,5592023
2,5592023
answered Apr 2 '13 at 1:05
tsunatsuna
1,283139
1,283139
5
Thans for this, I wish I had that information back when I originally posed the question.
– gview
Jan 31 '14 at 17:55
This, very useful. Saved me from panic.
– Halsafar
Nov 14 '18 at 23:11
add a comment |
5
Thans for this, I wish I had that information back when I originally posed the question.
– gview
Jan 31 '14 at 17:55
This, very useful. Saved me from panic.
– Halsafar
Nov 14 '18 at 23:11
5
5
Thans for this, I wish I had that information back when I originally posed the question.
– gview
Jan 31 '14 at 17:55
Thans for this, I wish I had that information back when I originally posed the question.
– gview
Jan 31 '14 at 17:55
This, very useful. Saved me from panic.
– Halsafar
Nov 14 '18 at 23:11
This, very useful. Saved me from panic.
– Halsafar
Nov 14 '18 at 23:11
add a comment |
The "seek error rate" and "raw read error rate" RAW_VALUES are virtually meaningless for anyone but Seagate's support. As others pointed out, raw values of parameters like "reallocated sector count" or entries in the drive's error log are more likely to indicate a higher probability of failure.
But you can take a look at the interpreted data in the VALUE, WORST and THRESH columns which are meant to be read as gauges:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH
7 Seek_Error_Rate 0x000f 077 060 030
Meaning that your seek error rate is currently considered to be "77% good" and is reported as a problem by SMART when it reaches "30% good". It had been as low as "60% good" once, but has magically recovered since. Note that the interpreted values are calculated by the drive's SMART logic internally and the exact calculation may or may not be published by the manufacturer and typically cannot be tweaked by the user.
Personally, I consider a drive containing error log entries as "failing" and urge for a replacement as soon as they occur. But all in all, SMART data has turned out to be a rather weak indicator for failure prediction, as a research paper published by Google uncovered.
add a comment |
The "seek error rate" and "raw read error rate" RAW_VALUES are virtually meaningless for anyone but Seagate's support. As others pointed out, raw values of parameters like "reallocated sector count" or entries in the drive's error log are more likely to indicate a higher probability of failure.
But you can take a look at the interpreted data in the VALUE, WORST and THRESH columns which are meant to be read as gauges:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH
7 Seek_Error_Rate 0x000f 077 060 030
Meaning that your seek error rate is currently considered to be "77% good" and is reported as a problem by SMART when it reaches "30% good". It had been as low as "60% good" once, but has magically recovered since. Note that the interpreted values are calculated by the drive's SMART logic internally and the exact calculation may or may not be published by the manufacturer and typically cannot be tweaked by the user.
Personally, I consider a drive containing error log entries as "failing" and urge for a replacement as soon as they occur. But all in all, SMART data has turned out to be a rather weak indicator for failure prediction, as a research paper published by Google uncovered.
add a comment |
The "seek error rate" and "raw read error rate" RAW_VALUES are virtually meaningless for anyone but Seagate's support. As others pointed out, raw values of parameters like "reallocated sector count" or entries in the drive's error log are more likely to indicate a higher probability of failure.
But you can take a look at the interpreted data in the VALUE, WORST and THRESH columns which are meant to be read as gauges:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH
7 Seek_Error_Rate 0x000f 077 060 030
Meaning that your seek error rate is currently considered to be "77% good" and is reported as a problem by SMART when it reaches "30% good". It had been as low as "60% good" once, but has magically recovered since. Note that the interpreted values are calculated by the drive's SMART logic internally and the exact calculation may or may not be published by the manufacturer and typically cannot be tweaked by the user.
Personally, I consider a drive containing error log entries as "failing" and urge for a replacement as soon as they occur. But all in all, SMART data has turned out to be a rather weak indicator for failure prediction, as a research paper published by Google uncovered.
The "seek error rate" and "raw read error rate" RAW_VALUES are virtually meaningless for anyone but Seagate's support. As others pointed out, raw values of parameters like "reallocated sector count" or entries in the drive's error log are more likely to indicate a higher probability of failure.
But you can take a look at the interpreted data in the VALUE, WORST and THRESH columns which are meant to be read as gauges:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH
7 Seek_Error_Rate 0x000f 077 060 030
Meaning that your seek error rate is currently considered to be "77% good" and is reported as a problem by SMART when it reaches "30% good". It had been as low as "60% good" once, but has magically recovered since. Note that the interpreted values are calculated by the drive's SMART logic internally and the exact calculation may or may not be published by the manufacturer and typically cannot be tweaked by the user.
Personally, I consider a drive containing error log entries as "failing" and urge for a replacement as soon as they occur. But all in all, SMART data has turned out to be a rather weak indicator for failure prediction, as a research paper published by Google uncovered.
edited Jul 2 '14 at 15:26
answered Sep 20 '11 at 23:08
the-wabbitthe-wabbit
36.2k1181151
36.2k1181151
add a comment |
add a comment |
I realized this discussion is a bit old but want to add my 2 cents. I have found the smart information to be quite a good indicator of pre-fail. When you get a smart threshold tripped then replace the drive. That is what those thresholds are for.
The vast majority of time you will start to see bad sectors. That is a sure sign the drive is starting to fail. SMART has saved me many times. I use software RAID 1 and it's very helpful since you simply replace the failing drive and rebuild the array.
I also run short and long self test weekly.
smartctl -t short /dev/sda
smartctl -t long /dev/sda
Or add it /etc/smartd.conf and get it to email you if there are errors
/dev/sda -s L/../../3/22 -I 194 -m someemail@somedomain
/dev/sdb -s L/../../7/22 -I 194 -m someemail@somedomain
Make sure to install logwatch and redirect root to an email address and check the daily emails from logwatch. SMARTD tripped flags will show up there but it's of no help if nobody is monitoring that regularly.
add a comment |
I realized this discussion is a bit old but want to add my 2 cents. I have found the smart information to be quite a good indicator of pre-fail. When you get a smart threshold tripped then replace the drive. That is what those thresholds are for.
The vast majority of time you will start to see bad sectors. That is a sure sign the drive is starting to fail. SMART has saved me many times. I use software RAID 1 and it's very helpful since you simply replace the failing drive and rebuild the array.
I also run short and long self test weekly.
smartctl -t short /dev/sda
smartctl -t long /dev/sda
Or add it /etc/smartd.conf and get it to email you if there are errors
/dev/sda -s L/../../3/22 -I 194 -m someemail@somedomain
/dev/sdb -s L/../../7/22 -I 194 -m someemail@somedomain
Make sure to install logwatch and redirect root to an email address and check the daily emails from logwatch. SMARTD tripped flags will show up there but it's of no help if nobody is monitoring that regularly.
add a comment |
I realized this discussion is a bit old but want to add my 2 cents. I have found the smart information to be quite a good indicator of pre-fail. When you get a smart threshold tripped then replace the drive. That is what those thresholds are for.
The vast majority of time you will start to see bad sectors. That is a sure sign the drive is starting to fail. SMART has saved me many times. I use software RAID 1 and it's very helpful since you simply replace the failing drive and rebuild the array.
I also run short and long self test weekly.
smartctl -t short /dev/sda
smartctl -t long /dev/sda
Or add it /etc/smartd.conf and get it to email you if there are errors
/dev/sda -s L/../../3/22 -I 194 -m someemail@somedomain
/dev/sdb -s L/../../7/22 -I 194 -m someemail@somedomain
Make sure to install logwatch and redirect root to an email address and check the daily emails from logwatch. SMARTD tripped flags will show up there but it's of no help if nobody is monitoring that regularly.
I realized this discussion is a bit old but want to add my 2 cents. I have found the smart information to be quite a good indicator of pre-fail. When you get a smart threshold tripped then replace the drive. That is what those thresholds are for.
The vast majority of time you will start to see bad sectors. That is a sure sign the drive is starting to fail. SMART has saved me many times. I use software RAID 1 and it's very helpful since you simply replace the failing drive and rebuild the array.
I also run short and long self test weekly.
smartctl -t short /dev/sda
smartctl -t long /dev/sda
Or add it /etc/smartd.conf and get it to email you if there are errors
/dev/sda -s L/../../3/22 -I 194 -m someemail@somedomain
/dev/sdb -s L/../../7/22 -I 194 -m someemail@somedomain
Make sure to install logwatch and redirect root to an email address and check the daily emails from logwatch. SMARTD tripped flags will show up there but it's of no help if nobody is monitoring that regularly.
answered Jul 5 '14 at 14:21
Fred FlintFred Flint
42144
42144
add a comment |
add a comment |
Yes, those fields look bad but I don't trust (anymore) the info reported by smart (my test machine have a drive which should be dead a long time ago if you read the data with smartctrl)
The fact is that you have reported high iowait and the drives are 3 years old. This should be enough for you to change the drives.
1
For various reasons we need to maximize our investment in the hardware. The iowait had to do with the ridiculous load, as well as some configuration mistakes we made when setting up the box.
– gview
Sep 20 '11 at 22:57
add a comment |
Yes, those fields look bad but I don't trust (anymore) the info reported by smart (my test machine have a drive which should be dead a long time ago if you read the data with smartctrl)
The fact is that you have reported high iowait and the drives are 3 years old. This should be enough for you to change the drives.
1
For various reasons we need to maximize our investment in the hardware. The iowait had to do with the ridiculous load, as well as some configuration mistakes we made when setting up the box.
– gview
Sep 20 '11 at 22:57
add a comment |
Yes, those fields look bad but I don't trust (anymore) the info reported by smart (my test machine have a drive which should be dead a long time ago if you read the data with smartctrl)
The fact is that you have reported high iowait and the drives are 3 years old. This should be enough for you to change the drives.
Yes, those fields look bad but I don't trust (anymore) the info reported by smart (my test machine have a drive which should be dead a long time ago if you read the data with smartctrl)
The fact is that you have reported high iowait and the drives are 3 years old. This should be enough for you to change the drives.
answered Sep 20 '11 at 22:28
migabimigabi
1543
1543
1
For various reasons we need to maximize our investment in the hardware. The iowait had to do with the ridiculous load, as well as some configuration mistakes we made when setting up the box.
– gview
Sep 20 '11 at 22:57
add a comment |
1
For various reasons we need to maximize our investment in the hardware. The iowait had to do with the ridiculous load, as well as some configuration mistakes we made when setting up the box.
– gview
Sep 20 '11 at 22:57
1
1
For various reasons we need to maximize our investment in the hardware. The iowait had to do with the ridiculous load, as well as some configuration mistakes we made when setting up the box.
– gview
Sep 20 '11 at 22:57
For various reasons we need to maximize our investment in the hardware. The iowait had to do with the ridiculous load, as well as some configuration mistakes we made when setting up the box.
– gview
Sep 20 '11 at 22:57
add a comment |
Sorry to commit necromancy on this post, but in my experience, the "Raw Read Error Rate" and "Hardware ECC Recovered" fields for a Seagate drive will quite literally go all over the place and increment constantly into the trillions range at which point they'll cycle back around to zero to continue the process again. I've a Seagate ST9750420AS that has had that problem since day one and still works great even after quite a few years and 3500+ hours of use.
I think those fields can be safely ignored if you're running one in your case. Just make sure the two fields are reporting the same number and in sync constantly. If they're not...well... That actually might mean a problem.
add a comment |
Sorry to commit necromancy on this post, but in my experience, the "Raw Read Error Rate" and "Hardware ECC Recovered" fields for a Seagate drive will quite literally go all over the place and increment constantly into the trillions range at which point they'll cycle back around to zero to continue the process again. I've a Seagate ST9750420AS that has had that problem since day one and still works great even after quite a few years and 3500+ hours of use.
I think those fields can be safely ignored if you're running one in your case. Just make sure the two fields are reporting the same number and in sync constantly. If they're not...well... That actually might mean a problem.
add a comment |
Sorry to commit necromancy on this post, but in my experience, the "Raw Read Error Rate" and "Hardware ECC Recovered" fields for a Seagate drive will quite literally go all over the place and increment constantly into the trillions range at which point they'll cycle back around to zero to continue the process again. I've a Seagate ST9750420AS that has had that problem since day one and still works great even after quite a few years and 3500+ hours of use.
I think those fields can be safely ignored if you're running one in your case. Just make sure the two fields are reporting the same number and in sync constantly. If they're not...well... That actually might mean a problem.
Sorry to commit necromancy on this post, but in my experience, the "Raw Read Error Rate" and "Hardware ECC Recovered" fields for a Seagate drive will quite literally go all over the place and increment constantly into the trillions range at which point they'll cycle back around to zero to continue the process again. I've a Seagate ST9750420AS that has had that problem since day one and still works great even after quite a few years and 3500+ hours of use.
I think those fields can be safely ignored if you're running one in your case. Just make sure the two fields are reporting the same number and in sync constantly. If they're not...well... That actually might mean a problem.
answered May 15 at 20:05
Ryan GandyRyan Gandy
1
1
add a comment |
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f313649%2fhow-to-interpret-this-smartctl-smartmon-data%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
There is a good description here (too long to repost, please follow the link): lime-technology.com/wiki/Understanding_SMART_Reports In case the link goes down, some important quotes: "This is an indicator of the current rate of errors of the low level physical sector read operations. In normal operation, there are ALWAYS a small number of errors [...] there is NO issue with the drive." and "PLEASE completely ignore the RAW_VALUE number! Only Seagates report the raw value, which yes, does appear to be the number of raw read errors, but should be ignored, completely."
– Konrad Gajewski
Feb 12 '18 at 21:19