Linux software RAID: Is a disk faulty?mdadm raid5 failure. set wrong drive to faulty by accidentMissing 2 Disk out of software RAID 5DegradedArray event after rsync but later mdadm and smartctl do not show any issuemdadm isn't rebuilding the arraymdadm: drive replacement shows up as spare and refuses to syncSoftware RAID 1 does not extend on two new additional drivesHow to fix my broken raid10 arraymdadm RAID6, recover 2 disk failure during reshapeHow increase write speed of raid1 mdadm?JBOD Failed to assemble after middle device Failed

Did this character show any indication of wanting to rule before S8E6?

Access to the path 'c:somepath' is denied for MSSQL CLR

Where have Brexit voters gone?

Is the field of q-series 'dead'?

Why did Jon Snow do this immoral act if he is so honorable?

Should one buy new hardware after a system compromise?

Using credit/debit card details vs swiping a card in a payment (credit card) terminal

Python program to take in two strings and print the larger string

Is the Indo-European language family made up?

Is Jon Snow the last of his House?

Pirate democracy at its finest

What are these arcade games in Ghostbusters 1984?

How to draw Sankey diagram with Tikz?

USPS Back Room - Trespassing?

Why did Theresa May offer a vote on a second Brexit referendum?

What is a Power on Reset IC?

Why does Mjolnir fall down in Age of Ultron but not in Endgame?

Can my floppy disk still work without a shutter spring?

Why aren't space telescopes put in GEO?

How can I tell if I'm being too picky as a referee?

Why does this if-statement combining assignment and an equality check return true?

Have 1.5% of all nuclear reactors ever built melted down?

Defining the standard model of PA so that a space alien could understand

Does pair production happen even when the photon is around a neutron?



Linux software RAID: Is a disk faulty?


mdadm raid5 failure. set wrong drive to faulty by accidentMissing 2 Disk out of software RAID 5DegradedArray event after rsync but later mdadm and smartctl do not show any issuemdadm isn't rebuilding the arraymdadm: drive replacement shows up as spare and refuses to syncSoftware RAID 1 does not extend on two new additional drivesHow to fix my broken raid10 arraymdadm RAID6, recover 2 disk failure during reshapeHow increase write speed of raid1 mdadm?JBOD Failed to assemble after middle device Failed






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








2















I have a setup with three identical harddrives, recognized as sdb, sdd and sde. I have one RAID0 partition (md0) and two RAID5 partitions (md1 and md2) across these three disks. All my RAID partitions appear to be working normally, and have done so since I created them. I have seen messages on the console about md[12] being "active with 2 out of 3 devices", which to me sounds like a problem.



$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10]
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
24574464 blocks super 1.2 512k chunks

md1 : active raid5 sdd2[1] sde2[3]
5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

md0 : active raid5 sdd1[1] sde1[3]
20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

unused devices: <none>


I'm not experienced with mdadm, but this to me seems like arrays md[12] are missing the sdb disk. However, md2 does not seems to be missing anything. So, has the sdb disk failed or is this just some configuration issue? Any more diagnostics I need to do to figure that out?



EDIT:



# mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : active
Device UUID : cee60351:c3a525ce:a449b326:6cb5970d

Update Time : Tue May 24 21:43:20 2016
Checksum : 4afdc54a - correct
Events : 7400

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)


# mdadm --examine /dev/sde2
/dev/sde2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3

Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26

Update Time : Tue Oct 11 21:17:10 2016
Checksum : a3992056 - correct
Events : 896128

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)


So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.



# mdadm --detail --verbose /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Tue Oct 11 22:03:50 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : jostein1:1 (local to host jostein1)
UUID : 94d56562:90a999e8:601741c0:55d8c83f
Events : 897492

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 50 1 active sync /dev/sdd2
3 8 66 2 active sync /dev/sde2


EDIT2:



The event count for the device no longer part of the array is very different from the others:



# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
Events : 603
/dev/sdd1:
Events : 374272
/dev/sde1:
Events : 374272


Smartmontools for the disk that is not part of the array:



# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.









share|improve this question
























  • check the logfiles

    – Ipor Sircer
    Oct 11 '16 at 18:48











  • There are no errors in dmesg. Any other logs I should check?

    – josteinb
    Oct 11 '16 at 19:05











  • mdadm --examine /dev/sdb2

    – Ipor Sircer
    Oct 11 '16 at 19:08











  • Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?

    – Zoredache
    Oct 11 '16 at 22:30











  • @Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.

    – josteinb
    Oct 12 '16 at 16:58

















2















I have a setup with three identical harddrives, recognized as sdb, sdd and sde. I have one RAID0 partition (md0) and two RAID5 partitions (md1 and md2) across these three disks. All my RAID partitions appear to be working normally, and have done so since I created them. I have seen messages on the console about md[12] being "active with 2 out of 3 devices", which to me sounds like a problem.



$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10]
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
24574464 blocks super 1.2 512k chunks

md1 : active raid5 sdd2[1] sde2[3]
5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

md0 : active raid5 sdd1[1] sde1[3]
20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

unused devices: <none>


I'm not experienced with mdadm, but this to me seems like arrays md[12] are missing the sdb disk. However, md2 does not seems to be missing anything. So, has the sdb disk failed or is this just some configuration issue? Any more diagnostics I need to do to figure that out?



EDIT:



# mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : active
Device UUID : cee60351:c3a525ce:a449b326:6cb5970d

Update Time : Tue May 24 21:43:20 2016
Checksum : 4afdc54a - correct
Events : 7400

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)


# mdadm --examine /dev/sde2
/dev/sde2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3

Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26

Update Time : Tue Oct 11 21:17:10 2016
Checksum : a3992056 - correct
Events : 896128

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)


So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.



# mdadm --detail --verbose /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Tue Oct 11 22:03:50 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : jostein1:1 (local to host jostein1)
UUID : 94d56562:90a999e8:601741c0:55d8c83f
Events : 897492

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 50 1 active sync /dev/sdd2
3 8 66 2 active sync /dev/sde2


EDIT2:



The event count for the device no longer part of the array is very different from the others:



# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
Events : 603
/dev/sdd1:
Events : 374272
/dev/sde1:
Events : 374272


Smartmontools for the disk that is not part of the array:



# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.









share|improve this question
























  • check the logfiles

    – Ipor Sircer
    Oct 11 '16 at 18:48











  • There are no errors in dmesg. Any other logs I should check?

    – josteinb
    Oct 11 '16 at 19:05











  • mdadm --examine /dev/sdb2

    – Ipor Sircer
    Oct 11 '16 at 19:08











  • Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?

    – Zoredache
    Oct 11 '16 at 22:30











  • @Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.

    – josteinb
    Oct 12 '16 at 16:58













2












2








2








I have a setup with three identical harddrives, recognized as sdb, sdd and sde. I have one RAID0 partition (md0) and two RAID5 partitions (md1 and md2) across these three disks. All my RAID partitions appear to be working normally, and have done so since I created them. I have seen messages on the console about md[12] being "active with 2 out of 3 devices", which to me sounds like a problem.



$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10]
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
24574464 blocks super 1.2 512k chunks

md1 : active raid5 sdd2[1] sde2[3]
5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

md0 : active raid5 sdd1[1] sde1[3]
20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

unused devices: <none>


I'm not experienced with mdadm, but this to me seems like arrays md[12] are missing the sdb disk. However, md2 does not seems to be missing anything. So, has the sdb disk failed or is this just some configuration issue? Any more diagnostics I need to do to figure that out?



EDIT:



# mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : active
Device UUID : cee60351:c3a525ce:a449b326:6cb5970d

Update Time : Tue May 24 21:43:20 2016
Checksum : 4afdc54a - correct
Events : 7400

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)


# mdadm --examine /dev/sde2
/dev/sde2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3

Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26

Update Time : Tue Oct 11 21:17:10 2016
Checksum : a3992056 - correct
Events : 896128

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)


So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.



# mdadm --detail --verbose /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Tue Oct 11 22:03:50 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : jostein1:1 (local to host jostein1)
UUID : 94d56562:90a999e8:601741c0:55d8c83f
Events : 897492

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 50 1 active sync /dev/sdd2
3 8 66 2 active sync /dev/sde2


EDIT2:



The event count for the device no longer part of the array is very different from the others:



# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
Events : 603
/dev/sdd1:
Events : 374272
/dev/sde1:
Events : 374272


Smartmontools for the disk that is not part of the array:



# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.









share|improve this question
















I have a setup with three identical harddrives, recognized as sdb, sdd and sde. I have one RAID0 partition (md0) and two RAID5 partitions (md1 and md2) across these three disks. All my RAID partitions appear to be working normally, and have done so since I created them. I have seen messages on the console about md[12] being "active with 2 out of 3 devices", which to me sounds like a problem.



$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10]
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
24574464 blocks super 1.2 512k chunks

md1 : active raid5 sdd2[1] sde2[3]
5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

md0 : active raid5 sdd1[1] sde1[3]
20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

unused devices: <none>


I'm not experienced with mdadm, but this to me seems like arrays md[12] are missing the sdb disk. However, md2 does not seems to be missing anything. So, has the sdb disk failed or is this just some configuration issue? Any more diagnostics I need to do to figure that out?



EDIT:



# mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : active
Device UUID : cee60351:c3a525ce:a449b326:6cb5970d

Update Time : Tue May 24 21:43:20 2016
Checksum : 4afdc54a - correct
Events : 7400

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)


# mdadm --examine /dev/sde2
/dev/sde2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
Name : jostein1:1 (local to host jostein1)
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Raid Devices : 3

Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26

Update Time : Tue Oct 11 21:17:10 2016
Checksum : a3992056 - correct
Events : 896128

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)


So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.



# mdadm --detail --verbose /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Sat Aug 18 13:00:00 2012
Raid Level : raid5
Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Tue Oct 11 22:03:50 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : jostein1:1 (local to host jostein1)
UUID : 94d56562:90a999e8:601741c0:55d8c83f
Events : 897492

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 50 1 active sync /dev/sdd2
3 8 66 2 active sync /dev/sde2


EDIT2:



The event count for the device no longer part of the array is very different from the others:



# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
Events : 603
/dev/sdd1:
Events : 374272
/dev/sde1:
Events : 374272


Smartmontools for the disk that is not part of the array:



# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.






raid mdadm raid5






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 12 '16 at 16:55







josteinb

















asked Oct 11 '16 at 18:40









josteinbjosteinb

1115




1115












  • check the logfiles

    – Ipor Sircer
    Oct 11 '16 at 18:48











  • There are no errors in dmesg. Any other logs I should check?

    – josteinb
    Oct 11 '16 at 19:05











  • mdadm --examine /dev/sdb2

    – Ipor Sircer
    Oct 11 '16 at 19:08











  • Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?

    – Zoredache
    Oct 11 '16 at 22:30











  • @Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.

    – josteinb
    Oct 12 '16 at 16:58

















  • check the logfiles

    – Ipor Sircer
    Oct 11 '16 at 18:48











  • There are no errors in dmesg. Any other logs I should check?

    – josteinb
    Oct 11 '16 at 19:05











  • mdadm --examine /dev/sdb2

    – Ipor Sircer
    Oct 11 '16 at 19:08











  • Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?

    – Zoredache
    Oct 11 '16 at 22:30











  • @Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.

    – josteinb
    Oct 12 '16 at 16:58
















check the logfiles

– Ipor Sircer
Oct 11 '16 at 18:48





check the logfiles

– Ipor Sircer
Oct 11 '16 at 18:48













There are no errors in dmesg. Any other logs I should check?

– josteinb
Oct 11 '16 at 19:05





There are no errors in dmesg. Any other logs I should check?

– josteinb
Oct 11 '16 at 19:05













mdadm --examine /dev/sdb2

– Ipor Sircer
Oct 11 '16 at 19:08





mdadm --examine /dev/sdb2

– Ipor Sircer
Oct 11 '16 at 19:08













Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?

– Zoredache
Oct 11 '16 at 22:30





Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?

– Zoredache
Oct 11 '16 at 22:30













@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.

– josteinb
Oct 12 '16 at 16:58





@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.

– josteinb
Oct 12 '16 at 16:58










2 Answers
2






active

oldest

votes


















0














Your mdstat file says it all.



[3/2] [_UU] means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU says the same.



For grater details on the raid devices (before going to the physical ones) you'd run (as root)



mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2


On my system (using raid6) I have simulated a failure and this is an example output:



dev/md0:
Version : 1.2
Creation Time : Thu Sep 29 09:51:41 2016
Raid Level : raid6
Array Size : 16764928 (15.99 GiB 17.17 GB)
Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
Raid Devices : 4
Total Devices : 5
Persistence : Superblock is persistent

Update Time : Thu Oct 11 13:06:50 2016
State : clean <<== CLEAN!
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : ubuntu:0 (local to host ubuntu)
UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
Events : 43

Number Major Minor RaidDevice State
4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1

0 8 1 - faulty /dev/sda1 <<== SW-REPLACED





share|improve this answer























  • So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

    – josteinb
    Oct 11 '16 at 20:07












  • @josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

    – EnzoR
    Oct 12 '16 at 7:29











  • @josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

    – EnzoR
    Oct 12 '16 at 8:12











  • I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

    – josteinb
    Oct 12 '16 at 16:52


















0














md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).



If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.



mdadm --re-add /dev/sdb[?] /dev/md1






share|improve this answer























  • I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

    – josteinb
    Oct 12 '16 at 16:50











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f808451%2flinux-software-raid-is-a-disk-faulty%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Your mdstat file says it all.



[3/2] [_UU] means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU says the same.



For grater details on the raid devices (before going to the physical ones) you'd run (as root)



mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2


On my system (using raid6) I have simulated a failure and this is an example output:



dev/md0:
Version : 1.2
Creation Time : Thu Sep 29 09:51:41 2016
Raid Level : raid6
Array Size : 16764928 (15.99 GiB 17.17 GB)
Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
Raid Devices : 4
Total Devices : 5
Persistence : Superblock is persistent

Update Time : Thu Oct 11 13:06:50 2016
State : clean <<== CLEAN!
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : ubuntu:0 (local to host ubuntu)
UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
Events : 43

Number Major Minor RaidDevice State
4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1

0 8 1 - faulty /dev/sda1 <<== SW-REPLACED





share|improve this answer























  • So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

    – josteinb
    Oct 11 '16 at 20:07












  • @josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

    – EnzoR
    Oct 12 '16 at 7:29











  • @josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

    – EnzoR
    Oct 12 '16 at 8:12











  • I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

    – josteinb
    Oct 12 '16 at 16:52















0














Your mdstat file says it all.



[3/2] [_UU] means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU says the same.



For grater details on the raid devices (before going to the physical ones) you'd run (as root)



mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2


On my system (using raid6) I have simulated a failure and this is an example output:



dev/md0:
Version : 1.2
Creation Time : Thu Sep 29 09:51:41 2016
Raid Level : raid6
Array Size : 16764928 (15.99 GiB 17.17 GB)
Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
Raid Devices : 4
Total Devices : 5
Persistence : Superblock is persistent

Update Time : Thu Oct 11 13:06:50 2016
State : clean <<== CLEAN!
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : ubuntu:0 (local to host ubuntu)
UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
Events : 43

Number Major Minor RaidDevice State
4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1

0 8 1 - faulty /dev/sda1 <<== SW-REPLACED





share|improve this answer























  • So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

    – josteinb
    Oct 11 '16 at 20:07












  • @josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

    – EnzoR
    Oct 12 '16 at 7:29











  • @josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

    – EnzoR
    Oct 12 '16 at 8:12











  • I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

    – josteinb
    Oct 12 '16 at 16:52













0












0








0







Your mdstat file says it all.



[3/2] [_UU] means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU says the same.



For grater details on the raid devices (before going to the physical ones) you'd run (as root)



mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2


On my system (using raid6) I have simulated a failure and this is an example output:



dev/md0:
Version : 1.2
Creation Time : Thu Sep 29 09:51:41 2016
Raid Level : raid6
Array Size : 16764928 (15.99 GiB 17.17 GB)
Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
Raid Devices : 4
Total Devices : 5
Persistence : Superblock is persistent

Update Time : Thu Oct 11 13:06:50 2016
State : clean <<== CLEAN!
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : ubuntu:0 (local to host ubuntu)
UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
Events : 43

Number Major Minor RaidDevice State
4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1

0 8 1 - faulty /dev/sda1 <<== SW-REPLACED





share|improve this answer













Your mdstat file says it all.



[3/2] [_UU] means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU says the same.



For grater details on the raid devices (before going to the physical ones) you'd run (as root)



mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2


On my system (using raid6) I have simulated a failure and this is an example output:



dev/md0:
Version : 1.2
Creation Time : Thu Sep 29 09:51:41 2016
Raid Level : raid6
Array Size : 16764928 (15.99 GiB 17.17 GB)
Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
Raid Devices : 4
Total Devices : 5
Persistence : Superblock is persistent

Update Time : Thu Oct 11 13:06:50 2016
State : clean <<== CLEAN!
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : ubuntu:0 (local to host ubuntu)
UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
Events : 43

Number Major Minor RaidDevice State
4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1

0 8 1 - faulty /dev/sda1 <<== SW-REPLACED






share|improve this answer












share|improve this answer



share|improve this answer










answered Oct 11 '16 at 20:02









EnzoREnzoR

1979




1979












  • So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

    – josteinb
    Oct 11 '16 at 20:07












  • @josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

    – EnzoR
    Oct 12 '16 at 7:29











  • @josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

    – EnzoR
    Oct 12 '16 at 8:12











  • I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

    – josteinb
    Oct 12 '16 at 16:52

















  • So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

    – josteinb
    Oct 11 '16 at 20:07












  • @josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

    – EnzoR
    Oct 12 '16 at 7:29











  • @josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

    – EnzoR
    Oct 12 '16 at 8:12











  • I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

    – josteinb
    Oct 12 '16 at 16:52
















So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

– josteinb
Oct 11 '16 at 20:07






So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

– josteinb
Oct 11 '16 at 20:07














@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

– EnzoR
Oct 12 '16 at 7:29





@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

– EnzoR
Oct 12 '16 at 7:29













@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

– EnzoR
Oct 12 '16 at 8:12





@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

– EnzoR
Oct 12 '16 at 8:12













I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

– josteinb
Oct 12 '16 at 16:52





I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

– josteinb
Oct 12 '16 at 16:52













0














md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).



If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.



mdadm --re-add /dev/sdb[?] /dev/md1






share|improve this answer























  • I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

    – josteinb
    Oct 12 '16 at 16:50















0














md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).



If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.



mdadm --re-add /dev/sdb[?] /dev/md1






share|improve this answer























  • I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

    – josteinb
    Oct 12 '16 at 16:50













0












0








0







md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).



If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.



mdadm --re-add /dev/sdb[?] /dev/md1






share|improve this answer













md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).



If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.



mdadm --re-add /dev/sdb[?] /dev/md1







share|improve this answer












share|improve this answer



share|improve this answer










answered Oct 11 '16 at 20:48









Brian TillmanBrian Tillman

56325




56325












  • I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

    – josteinb
    Oct 12 '16 at 16:50

















  • I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

    – josteinb
    Oct 12 '16 at 16:50
















I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

– josteinb
Oct 12 '16 at 16:50





I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

– josteinb
Oct 12 '16 at 16:50

















draft saved

draft discarded
















































Thanks for contributing an answer to Server Fault!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f808451%2flinux-software-raid-is-a-disk-faulty%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company