Linux software RAID: Is a disk faulty?mdadm raid5 failure. set wrong drive to faulty by accidentMissing 2 Disk out of software RAID 5DegradedArray event after rsync but later mdadm and smartctl do not show any issuemdadm isn't rebuilding the arraymdadm: drive replacement shows up as spare and refuses to syncSoftware RAID 1 does not extend on two new additional drivesHow to fix my broken raid10 arraymdadm RAID6, recover 2 disk failure during reshapeHow increase write speed of raid1 mdadm?JBOD Failed to assemble after middle device Failed
Did this character show any indication of wanting to rule before S8E6?
Access to the path 'c:somepath' is denied for MSSQL CLR
Where have Brexit voters gone?
Is the field of q-series 'dead'?
Why did Jon Snow do this immoral act if he is so honorable?
Should one buy new hardware after a system compromise?
Using credit/debit card details vs swiping a card in a payment (credit card) terminal
Python program to take in two strings and print the larger string
Is the Indo-European language family made up?
Is Jon Snow the last of his House?
Pirate democracy at its finest
What are these arcade games in Ghostbusters 1984?
How to draw Sankey diagram with Tikz?
USPS Back Room - Trespassing?
Why did Theresa May offer a vote on a second Brexit referendum?
What is a Power on Reset IC?
Why does Mjolnir fall down in Age of Ultron but not in Endgame?
Can my floppy disk still work without a shutter spring?
Why aren't space telescopes put in GEO?
How can I tell if I'm being too picky as a referee?
Why does this if-statement combining assignment and an equality check return true?
Have 1.5% of all nuclear reactors ever built melted down?
Defining the standard model of PA so that a space alien could understand
Does pair production happen even when the photon is around a neutron?
Linux software RAID: Is a disk faulty?
mdadm raid5 failure. set wrong drive to faulty by accidentMissing 2 Disk out of software RAID 5DegradedArray event after rsync but later mdadm and smartctl do not show any issuemdadm isn't rebuilding the arraymdadm: drive replacement shows up as spare and refuses to syncSoftware RAID 1 does not extend on two new additional drivesHow to fix my broken raid10 arraymdadm RAID6, recover 2 disk failure during reshapeHow increase write speed of raid1 mdadm?JBOD Failed to assemble after middle device Failed
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a setup with three identical harddrives, recognized as sdb, sdd and sde. I have one RAID0 partition (md0) and two RAID5 partitions (md1 and md2) across these three disks. All my RAID partitions appear to be working normally, and have done so since I created them. I have seen messages on the console about md[12] being "active with 2 out of 3 devices", which to me sounds like a problem.
$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10]
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
24574464 blocks super 1.2 512k chunks
md1 : active raid5 sdd2[1] sde2[3]
5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
md0 : active raid5 sdd1[1] sde1[3]
20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
unused devices: <none>
I'm not experienced with mdadm, but this to me seems like arrays md[12] are missing the sdb disk. However, md2 does not seems to be missing anything. So, has the sdb disk failed or is this just some configuration issue? Any more diagnostics I need to do to figure that out?
EDIT:
# mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : active
Device UUID : cee60351:c3a525ce:a449b326:6cb5970d
Update Time : Tue May 24 21:43:20 2016
Checksum : 4afdc54a - correct
Events : 7400
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sde2
/dev/sde2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26
Update Time : Tue Oct 11 21:17:10 2016
Checksum : a3992056 - correct
Events : 896128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)
So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.
# mdadm --detail --verbose /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Tue Oct 11 22:03:50 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : jostein1:1 (local to host jostein1)
UUID : 94d56562:90a999e8:601741c0:55d8c83f
Events : 897492
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 50 1 active sync /dev/sdd2
3 8 66 2 active sync /dev/sde2
EDIT2:
The event count for the device no longer part of the array is very different from the others:
# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
Events : 603
/dev/sdd1:
Events : 374272
/dev/sde1:
Events : 374272
Smartmontools for the disk that is not part of the array:
# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
raid mdadm raid5
add a comment |
I have a setup with three identical harddrives, recognized as sdb, sdd and sde. I have one RAID0 partition (md0) and two RAID5 partitions (md1 and md2) across these three disks. All my RAID partitions appear to be working normally, and have done so since I created them. I have seen messages on the console about md[12] being "active with 2 out of 3 devices", which to me sounds like a problem.
$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10]
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
24574464 blocks super 1.2 512k chunks
md1 : active raid5 sdd2[1] sde2[3]
5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
md0 : active raid5 sdd1[1] sde1[3]
20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
unused devices: <none>
I'm not experienced with mdadm, but this to me seems like arrays md[12] are missing the sdb disk. However, md2 does not seems to be missing anything. So, has the sdb disk failed or is this just some configuration issue? Any more diagnostics I need to do to figure that out?
EDIT:
# mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : active
Device UUID : cee60351:c3a525ce:a449b326:6cb5970d
Update Time : Tue May 24 21:43:20 2016
Checksum : 4afdc54a - correct
Events : 7400
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sde2
/dev/sde2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26
Update Time : Tue Oct 11 21:17:10 2016
Checksum : a3992056 - correct
Events : 896128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)
So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.
# mdadm --detail --verbose /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Tue Oct 11 22:03:50 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : jostein1:1 (local to host jostein1)
UUID : 94d56562:90a999e8:601741c0:55d8c83f
Events : 897492
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 50 1 active sync /dev/sdd2
3 8 66 2 active sync /dev/sde2
EDIT2:
The event count for the device no longer part of the array is very different from the others:
# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
Events : 603
/dev/sdd1:
Events : 374272
/dev/sde1:
Events : 374272
Smartmontools for the disk that is not part of the array:
# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
raid mdadm raid5
check the logfiles
– Ipor Sircer
Oct 11 '16 at 18:48
There are no errors in dmesg. Any other logs I should check?
– josteinb
Oct 11 '16 at 19:05
mdadm --examine /dev/sdb2
– Ipor Sircer
Oct 11 '16 at 19:08
Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?
– Zoredache
Oct 11 '16 at 22:30
@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.
– josteinb
Oct 12 '16 at 16:58
add a comment |
I have a setup with three identical harddrives, recognized as sdb, sdd and sde. I have one RAID0 partition (md0) and two RAID5 partitions (md1 and md2) across these three disks. All my RAID partitions appear to be working normally, and have done so since I created them. I have seen messages on the console about md[12] being "active with 2 out of 3 devices", which to me sounds like a problem.
$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10]
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
24574464 blocks super 1.2 512k chunks
md1 : active raid5 sdd2[1] sde2[3]
5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
md0 : active raid5 sdd1[1] sde1[3]
20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
unused devices: <none>
I'm not experienced with mdadm, but this to me seems like arrays md[12] are missing the sdb disk. However, md2 does not seems to be missing anything. So, has the sdb disk failed or is this just some configuration issue? Any more diagnostics I need to do to figure that out?
EDIT:
# mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : active
Device UUID : cee60351:c3a525ce:a449b326:6cb5970d
Update Time : Tue May 24 21:43:20 2016
Checksum : 4afdc54a - correct
Events : 7400
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sde2
/dev/sde2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26
Update Time : Tue Oct 11 21:17:10 2016
Checksum : a3992056 - correct
Events : 896128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)
So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.
# mdadm --detail --verbose /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Tue Oct 11 22:03:50 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : jostein1:1 (local to host jostein1)
UUID : 94d56562:90a999e8:601741c0:55d8c83f
Events : 897492
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 50 1 active sync /dev/sdd2
3 8 66 2 active sync /dev/sde2
EDIT2:
The event count for the device no longer part of the array is very different from the others:
# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
Events : 603
/dev/sdd1:
Events : 374272
/dev/sde1:
Events : 374272
Smartmontools for the disk that is not part of the array:
# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
raid mdadm raid5
I have a setup with three identical harddrives, recognized as sdb, sdd and sde. I have one RAID0 partition (md0) and two RAID5 partitions (md1 and md2) across these three disks. All my RAID partitions appear to be working normally, and have done so since I created them. I have seen messages on the console about md[12] being "active with 2 out of 3 devices", which to me sounds like a problem.
$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10]
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
24574464 blocks super 1.2 512k chunks
md1 : active raid5 sdd2[1] sde2[3]
5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
md0 : active raid5 sdd1[1] sde1[3]
20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
unused devices: <none>
I'm not experienced with mdadm, but this to me seems like arrays md[12] are missing the sdb disk. However, md2 does not seems to be missing anything. So, has the sdb disk failed or is this just some configuration issue? Any more diagnostics I need to do to figure that out?
EDIT:
# mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : active
Device UUID : cee60351:c3a525ce:a449b326:6cb5970d
Update Time : Tue May 24 21:43:20 2016
Checksum : 4afdc54a - correct
Events : 7400
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sde2
/dev/sde2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26
Update Time : Tue Oct 11 21:17:10 2016
Checksum : a3992056 - correct
Events : 896128
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)
So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.
# mdadm --detail --verbose /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Tue Oct 11 22:03:50 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : jostein1:1 (local to host jostein1)
UUID : 94d56562:90a999e8:601741c0:55d8c83f
Events : 897492
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 50 1 active sync /dev/sdd2
3 8 66 2 active sync /dev/sde2
EDIT2:
The event count for the device no longer part of the array is very different from the others:
# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
Events : 603
/dev/sdd1:
Events : 374272
/dev/sde1:
Events : 374272
Smartmontools for the disk that is not part of the array:
# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
raid mdadm raid5
raid mdadm raid5
edited Oct 12 '16 at 16:55
josteinb
asked Oct 11 '16 at 18:40
josteinbjosteinb
1115
1115
check the logfiles
– Ipor Sircer
Oct 11 '16 at 18:48
There are no errors in dmesg. Any other logs I should check?
– josteinb
Oct 11 '16 at 19:05
mdadm --examine /dev/sdb2
– Ipor Sircer
Oct 11 '16 at 19:08
Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?
– Zoredache
Oct 11 '16 at 22:30
@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.
– josteinb
Oct 12 '16 at 16:58
add a comment |
check the logfiles
– Ipor Sircer
Oct 11 '16 at 18:48
There are no errors in dmesg. Any other logs I should check?
– josteinb
Oct 11 '16 at 19:05
mdadm --examine /dev/sdb2
– Ipor Sircer
Oct 11 '16 at 19:08
Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?
– Zoredache
Oct 11 '16 at 22:30
@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.
– josteinb
Oct 12 '16 at 16:58
check the logfiles
– Ipor Sircer
Oct 11 '16 at 18:48
check the logfiles
– Ipor Sircer
Oct 11 '16 at 18:48
There are no errors in dmesg. Any other logs I should check?
– josteinb
Oct 11 '16 at 19:05
There are no errors in dmesg. Any other logs I should check?
– josteinb
Oct 11 '16 at 19:05
mdadm --examine /dev/sdb2
– Ipor Sircer
Oct 11 '16 at 19:08
mdadm --examine /dev/sdb2
– Ipor Sircer
Oct 11 '16 at 19:08
Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?
– Zoredache
Oct 11 '16 at 22:30
Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?
– Zoredache
Oct 11 '16 at 22:30
@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.
– josteinb
Oct 12 '16 at 16:58
@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.
– josteinb
Oct 12 '16 at 16:58
add a comment |
2 Answers
2
active
oldest
votes
Your mdstat
file says it all.
[3/2] [_UU]
means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU
says the same.
For grater details on the raid devices (before going to the physical ones) you'd run (as root)
mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2
On my system (using raid6) I have simulated a failure and this is an example output:
dev/md0:
Version : 1.2
Creation Time : Thu Sep 29 09:51:41 2016
Raid Level : raid6
Array Size : 16764928 (15.99 GiB 17.17 GB)
Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
Raid Devices : 4
Total Devices : 5
Persistence : Superblock is persistent
Update Time : Thu Oct 11 13:06:50 2016
State : clean <<== CLEAN!
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : ubuntu:0 (local to host ubuntu)
UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
Events : 43
Number Major Minor RaidDevice State
4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1
0 8 1 - faulty /dev/sda1 <<== SW-REPLACED
So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.
– josteinb
Oct 11 '16 at 20:07
@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.
– EnzoR
Oct 12 '16 at 7:29
@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.
– EnzoR
Oct 12 '16 at 8:12
I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.
– josteinb
Oct 12 '16 at 16:52
add a comment |
md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).
If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.
mdadm --re-add /dev/sdb[?] /dev/md1
I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.
– josteinb
Oct 12 '16 at 16:50
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f808451%2flinux-software-raid-is-a-disk-faulty%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Your mdstat
file says it all.
[3/2] [_UU]
means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU
says the same.
For grater details on the raid devices (before going to the physical ones) you'd run (as root)
mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2
On my system (using raid6) I have simulated a failure and this is an example output:
dev/md0:
Version : 1.2
Creation Time : Thu Sep 29 09:51:41 2016
Raid Level : raid6
Array Size : 16764928 (15.99 GiB 17.17 GB)
Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
Raid Devices : 4
Total Devices : 5
Persistence : Superblock is persistent
Update Time : Thu Oct 11 13:06:50 2016
State : clean <<== CLEAN!
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : ubuntu:0 (local to host ubuntu)
UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
Events : 43
Number Major Minor RaidDevice State
4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1
0 8 1 - faulty /dev/sda1 <<== SW-REPLACED
So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.
– josteinb
Oct 11 '16 at 20:07
@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.
– EnzoR
Oct 12 '16 at 7:29
@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.
– EnzoR
Oct 12 '16 at 8:12
I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.
– josteinb
Oct 12 '16 at 16:52
add a comment |
Your mdstat
file says it all.
[3/2] [_UU]
means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU
says the same.
For grater details on the raid devices (before going to the physical ones) you'd run (as root)
mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2
On my system (using raid6) I have simulated a failure and this is an example output:
dev/md0:
Version : 1.2
Creation Time : Thu Sep 29 09:51:41 2016
Raid Level : raid6
Array Size : 16764928 (15.99 GiB 17.17 GB)
Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
Raid Devices : 4
Total Devices : 5
Persistence : Superblock is persistent
Update Time : Thu Oct 11 13:06:50 2016
State : clean <<== CLEAN!
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : ubuntu:0 (local to host ubuntu)
UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
Events : 43
Number Major Minor RaidDevice State
4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1
0 8 1 - faulty /dev/sda1 <<== SW-REPLACED
So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.
– josteinb
Oct 11 '16 at 20:07
@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.
– EnzoR
Oct 12 '16 at 7:29
@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.
– EnzoR
Oct 12 '16 at 8:12
I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.
– josteinb
Oct 12 '16 at 16:52
add a comment |
Your mdstat
file says it all.
[3/2] [_UU]
means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU
says the same.
For grater details on the raid devices (before going to the physical ones) you'd run (as root)
mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2
On my system (using raid6) I have simulated a failure and this is an example output:
dev/md0:
Version : 1.2
Creation Time : Thu Sep 29 09:51:41 2016
Raid Level : raid6
Array Size : 16764928 (15.99 GiB 17.17 GB)
Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
Raid Devices : 4
Total Devices : 5
Persistence : Superblock is persistent
Update Time : Thu Oct 11 13:06:50 2016
State : clean <<== CLEAN!
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : ubuntu:0 (local to host ubuntu)
UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
Events : 43
Number Major Minor RaidDevice State
4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1
0 8 1 - faulty /dev/sda1 <<== SW-REPLACED
Your mdstat
file says it all.
[3/2] [_UU]
means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU
says the same.
For grater details on the raid devices (before going to the physical ones) you'd run (as root)
mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2
On my system (using raid6) I have simulated a failure and this is an example output:
dev/md0:
Version : 1.2
Creation Time : Thu Sep 29 09:51:41 2016
Raid Level : raid6
Array Size : 16764928 (15.99 GiB 17.17 GB)
Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
Raid Devices : 4
Total Devices : 5
Persistence : Superblock is persistent
Update Time : Thu Oct 11 13:06:50 2016
State : clean <<== CLEAN!
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : ubuntu:0 (local to host ubuntu)
UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
Events : 43
Number Major Minor RaidDevice State
4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1
0 8 1 - faulty /dev/sda1 <<== SW-REPLACED
answered Oct 11 '16 at 20:02
EnzoREnzoR
1979
1979
So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.
– josteinb
Oct 11 '16 at 20:07
@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.
– EnzoR
Oct 12 '16 at 7:29
@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.
– EnzoR
Oct 12 '16 at 8:12
I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.
– josteinb
Oct 12 '16 at 16:52
add a comment |
So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.
– josteinb
Oct 11 '16 at 20:07
@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.
– EnzoR
Oct 12 '16 at 7:29
@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.
– EnzoR
Oct 12 '16 at 8:12
I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.
– josteinb
Oct 12 '16 at 16:52
So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.
– josteinb
Oct 11 '16 at 20:07
So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.
– josteinb
Oct 11 '16 at 20:07
@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.
– EnzoR
Oct 12 '16 at 7:29
@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.
– EnzoR
Oct 12 '16 at 7:29
@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.
– EnzoR
Oct 12 '16 at 8:12
@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.
– EnzoR
Oct 12 '16 at 8:12
I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.
– josteinb
Oct 12 '16 at 16:52
I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.
– josteinb
Oct 12 '16 at 16:52
add a comment |
md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).
If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.
mdadm --re-add /dev/sdb[?] /dev/md1
I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.
– josteinb
Oct 12 '16 at 16:50
add a comment |
md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).
If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.
mdadm --re-add /dev/sdb[?] /dev/md1
I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.
– josteinb
Oct 12 '16 at 16:50
add a comment |
md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).
If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.
mdadm --re-add /dev/sdb[?] /dev/md1
md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).
If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.
mdadm --re-add /dev/sdb[?] /dev/md1
answered Oct 11 '16 at 20:48
Brian TillmanBrian Tillman
56325
56325
I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.
– josteinb
Oct 12 '16 at 16:50
add a comment |
I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.
– josteinb
Oct 12 '16 at 16:50
I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.
– josteinb
Oct 12 '16 at 16:50
I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.
– josteinb
Oct 12 '16 at 16:50
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f808451%2flinux-software-raid-is-a-disk-faulty%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
check the logfiles
– Ipor Sircer
Oct 11 '16 at 18:48
There are no errors in dmesg. Any other logs I should check?
– josteinb
Oct 11 '16 at 19:05
mdadm --examine /dev/sdb2
– Ipor Sircer
Oct 11 '16 at 19:08
Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?
– Zoredache
Oct 11 '16 at 22:30
@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.
– josteinb
Oct 12 '16 at 16:58