Linux software RAID: Is a disk faulty?mdadm raid5 failure. set wrong drive to faulty by accidentMissing 2 Disk out of software RAID 5DegradedArray event after rsync but later mdadm and smartctl do not show any issuemdadm isn't rebuilding the arraymdadm: drive replacement shows up as spare and refuses to syncSoftware RAID 1 does not extend on two new additional drivesHow to fix my broken raid10 arraymdadm RAID6, recover 2 disk failure during reshapeHow increase write speed of raid1 mdadm?JBOD Failed to assemble after middle device Failed

Did this character show any indication of wanting to rule before S8E6?

Access to the path 'c:somepath' is denied for MSSQL CLR

Where have Brexit voters gone?

Is the field of q-series 'dead'?

Why did Jon Snow do this immoral act if he is so honorable?

Should one buy new hardware after a system compromise?

Using credit/debit card details vs swiping a card in a payment (credit card) terminal

Python program to take in two strings and print the larger string

Is the Indo-European language family made up?

Is Jon Snow the last of his House?

Pirate democracy at its finest

What are these arcade games in Ghostbusters 1984?

How to draw Sankey diagram with Tikz?

USPS Back Room - Trespassing?

Why did Theresa May offer a vote on a second Brexit referendum?

What is a Power on Reset IC?

Why does Mjolnir fall down in Age of Ultron but not in Endgame?

Can my floppy disk still work without a shutter spring?

Why aren't space telescopes put in GEO?

How can I tell if I'm being too picky as a referee?

Why does this if-statement combining assignment and an equality check return true?

Have 1.5% of all nuclear reactors ever built melted down?

Defining the standard model of PA so that a space alien could understand

Does pair production happen even when the photon is around a neutron?

Linux software RAID: Is a disk faulty?

mdadm raid5 failure. set wrong drive to faulty by accidentMissing 2 Disk out of software RAID 5DegradedArray event after rsync but later mdadm and smartctl do not show any issuemdadm isn't rebuilding the arraymdadm: drive replacement shows up as spare and refuses to syncSoftware RAID 1 does not extend on two new additional drivesHow to fix my broken raid10 arraymdadm RAID6, recover 2 disk failure during reshapeHow increase write speed of raid1 mdadm?JBOD Failed to assemble after middle device Failed

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I have a setup with three identical harddrives, recognized as sdb, sdd and sde. I have one RAID0 partition (md0) and two RAID5 partitions (md1 and md2) across these three disks. All my RAID partitions appear to be working normally, and have done so since I created them. I have seen messages on the console about md[12] being "active with 2 out of 3 devices", which to me sounds like a problem.

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10] 
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
 24574464 blocks super 1.2 512k chunks

md1 : active raid5 sdd2[1] sde2[3]
 5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

md0 : active raid5 sdd1[1] sde1[3]
 20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

unused devices: <none>

I'm not experienced with mdadm, but this to me seems like arrays md[12] are missing the sdb disk. However, md2 does not seems to be missing anything. So, has the sdb disk failed or is this just some configuration issue? Any more diagnostics I need to do to figure that out?

EDIT:

# mdadm --examine /dev/sdb2
/dev/sdb2:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Name : jostein1:1 (local to host jostein1)
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 Unused Space : before=262064 sectors, after=1024 sectors
 State : active
 Device UUID : cee60351:c3a525ce:a449b326:6cb5970d

 Update Time : Tue May 24 21:43:20 2016
 Checksum : 4afdc54a - correct
 Events : 7400

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 0
 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)


# mdadm --examine /dev/sde2
/dev/sde2:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Name : jostein1:1 (local to host jostein1)
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 Unused Space : before=262064 sectors, after=1024 sectors
 State : clean
 Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26

 Update Time : Tue Oct 11 21:17:10 2016
 Checksum : a3992056 - correct
 Events : 896128

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 2
 Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)

So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.

# mdadm --detail --verbose /dev/md1
/dev/md1:
 Version : 1.2
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
 Raid Devices : 3
 Total Devices : 2
 Persistence : Superblock is persistent

 Update Time : Tue Oct 11 22:03:50 2016
 State : clean, degraded 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : jostein1:1 (local to host jostein1)
 UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Events : 897492

 Number Major Minor RaidDevice State
 0 0 0 0 removed
 1 8 50 1 active sync /dev/sdd2
 3 8 66 2 active sync /dev/sde2

EDIT2:

The event count for the device no longer part of the array is very different from the others:

# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
 Events : 603
/dev/sdd1:
 Events : 374272
/dev/sde1:
 Events : 374272

Smartmontools for the disk that is not part of the array:

# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
 was completed without error.
 Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
 without error or no self-test has ever 
 been run.
Total time to complete Offline 
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
 Auto Offline data collection on/off support.
 Suspend Offline collection upon new
 command.
 Offline surface scan supported.
 Self-test supported.
 Conveyance Self-test supported.
 Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
 power-saving mode.
 Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
 General Purpose Logging supported.
Short self-test routine 
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
 SCT Feature Control supported.
 SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
 3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
 1 0 0 Not_testing
 2 0 0 Not_testing
 3 0 0 Not_testing
 4 0 0 Not_testing
 5 0 0 Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

edited Oct 12 '16 at 16:55

asked Oct 11 '16 at 18:40

josteinb

1115

check the logfiles

– Ipor Sircer
Oct 11 '16 at 18:48

There are no errors in dmesg. Any other logs I should check?

– josteinb
Oct 11 '16 at 19:05

mdadm --examine /dev/sdb2

– Ipor Sircer
Oct 11 '16 at 19:08

Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?

– Zoredache
Oct 11 '16 at 22:30

@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.

– josteinb
Oct 12 '16 at 16:58

add a comment |

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10] 
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
 24574464 blocks super 1.2 512k chunks

md1 : active raid5 sdd2[1] sde2[3]
 5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

md0 : active raid5 sdd1[1] sde1[3]
 20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

unused devices: <none>

EDIT:

# mdadm --examine /dev/sdb2
/dev/sdb2:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Name : jostein1:1 (local to host jostein1)
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 Unused Space : before=262064 sectors, after=1024 sectors
 State : active
 Device UUID : cee60351:c3a525ce:a449b326:6cb5970d

 Update Time : Tue May 24 21:43:20 2016
 Checksum : 4afdc54a - correct
 Events : 7400

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 0
 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)


# mdadm --examine /dev/sde2
/dev/sde2:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Name : jostein1:1 (local to host jostein1)
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 Unused Space : before=262064 sectors, after=1024 sectors
 State : clean
 Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26

 Update Time : Tue Oct 11 21:17:10 2016
 Checksum : a3992056 - correct
 Events : 896128

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 2
 Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)

So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.

# mdadm --detail --verbose /dev/md1
/dev/md1:
 Version : 1.2
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
 Raid Devices : 3
 Total Devices : 2
 Persistence : Superblock is persistent

 Update Time : Tue Oct 11 22:03:50 2016
 State : clean, degraded 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : jostein1:1 (local to host jostein1)
 UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Events : 897492

 Number Major Minor RaidDevice State
 0 0 0 0 removed
 1 8 50 1 active sync /dev/sdd2
 3 8 66 2 active sync /dev/sde2

EDIT2:

The event count for the device no longer part of the array is very different from the others:

# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
 Events : 603
/dev/sdd1:
 Events : 374272
/dev/sde1:
 Events : 374272

Smartmontools for the disk that is not part of the array:

# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
 was completed without error.
 Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
 without error or no self-test has ever 
 been run.
Total time to complete Offline 
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
 Auto Offline data collection on/off support.
 Suspend Offline collection upon new
 command.
 Offline surface scan supported.
 Self-test supported.
 Conveyance Self-test supported.
 Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
 power-saving mode.
 Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
 General Purpose Logging supported.
Short self-test routine 
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
 SCT Feature Control supported.
 SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
 3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
 1 0 0 Not_testing
 2 0 0 Not_testing
 3 0 0 Not_testing
 4 0 0 Not_testing
 5 0 0 Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

edited Oct 12 '16 at 16:55

asked Oct 11 '16 at 18:40

josteinb

1115

check the logfiles

– Ipor Sircer
Oct 11 '16 at 18:48

There are no errors in dmesg. Any other logs I should check?

– josteinb
Oct 11 '16 at 19:05

mdadm --examine /dev/sdb2

– Ipor Sircer
Oct 11 '16 at 19:08

Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?

– Zoredache
Oct 11 '16 at 22:30

@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.

– josteinb
Oct 12 '16 at 16:58

add a comment |

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10] 
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
 24574464 blocks super 1.2 512k chunks

md1 : active raid5 sdd2[1] sde2[3]
 5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

md0 : active raid5 sdd1[1] sde1[3]
 20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

unused devices: <none>

EDIT:

# mdadm --examine /dev/sdb2
/dev/sdb2:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Name : jostein1:1 (local to host jostein1)
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 Unused Space : before=262064 sectors, after=1024 sectors
 State : active
 Device UUID : cee60351:c3a525ce:a449b326:6cb5970d

 Update Time : Tue May 24 21:43:20 2016
 Checksum : 4afdc54a - correct
 Events : 7400

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 0
 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)


# mdadm --examine /dev/sde2
/dev/sde2:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Name : jostein1:1 (local to host jostein1)
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 Unused Space : before=262064 sectors, after=1024 sectors
 State : clean
 Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26

 Update Time : Tue Oct 11 21:17:10 2016
 Checksum : a3992056 - correct
 Events : 896128

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 2
 Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)

So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.

# mdadm --detail --verbose /dev/md1
/dev/md1:
 Version : 1.2
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
 Raid Devices : 3
 Total Devices : 2
 Persistence : Superblock is persistent

 Update Time : Tue Oct 11 22:03:50 2016
 State : clean, degraded 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : jostein1:1 (local to host jostein1)
 UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Events : 897492

 Number Major Minor RaidDevice State
 0 0 0 0 removed
 1 8 50 1 active sync /dev/sdd2
 3 8 66 2 active sync /dev/sde2

EDIT2:

The event count for the device no longer part of the array is very different from the others:

# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
 Events : 603
/dev/sdd1:
 Events : 374272
/dev/sde1:
 Events : 374272

Smartmontools for the disk that is not part of the array:

# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
 was completed without error.
 Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
 without error or no self-test has ever 
 been run.
Total time to complete Offline 
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
 Auto Offline data collection on/off support.
 Suspend Offline collection upon new
 command.
 Offline surface scan supported.
 Self-test supported.
 Conveyance Self-test supported.
 Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
 power-saving mode.
 Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
 General Purpose Logging supported.
Short self-test routine 
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
 SCT Feature Control supported.
 SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
 3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
 1 0 0 Not_testing
 2 0 0 Not_testing
 3 0 0 Not_testing
 4 0 0 Not_testing
 5 0 0 Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

edited Oct 12 '16 at 16:55

asked Oct 11 '16 at 18:40

josteinb

1115

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid1] [raid10] 
md2 : active raid0 sdb3[0] sdd3[1] sde3[2]
 24574464 blocks super 1.2 512k chunks

md1 : active raid5 sdd2[1] sde2[3]
 5823403008 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

md0 : active raid5 sdd1[1] sde1[3]
 20462592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

unused devices: <none>

EDIT:

# mdadm --examine /dev/sdb2
/dev/sdb2:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Name : jostein1:1 (local to host jostein1)
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Raid Devices : 3
Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 Unused Space : before=262064 sectors, after=1024 sectors
 State : active
 Device UUID : cee60351:c3a525ce:a449b326:6cb5970d

 Update Time : Tue May 24 21:43:20 2016
 Checksum : 4afdc54a - correct
 Events : 7400

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 0
 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)


# mdadm --examine /dev/sde2
/dev/sde2:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Name : jostein1:1 (local to host jostein1)
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 5823404032 (2776.82 GiB 2981.58 GB)
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 5823403008 (2776.82 GiB 2981.58 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 Unused Space : before=262064 sectors, after=1024 sectors
 State : clean
 Device UUID : 9c5abb6d:8f1eecbd:4b0f5459:c0424d26

 Update Time : Tue Oct 11 21:17:10 2016
 Checksum : a3992056 - correct
 Events : 896128

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 2
 Array State : .AA ('A' == active, '.' == missing, 'R' == replacing)

So --examine on sdb shows it is active, while the same command on sdd and sde show it as missing.

# mdadm --detail --verbose /dev/md1
/dev/md1:
 Version : 1.2
 Creation Time : Sat Aug 18 13:00:00 2012
 Raid Level : raid5
 Array Size : 5823403008 (5553.63 GiB 5963.16 GB)
 Used Dev Size : 2911701504 (2776.82 GiB 2981.58 GB)
 Raid Devices : 3
 Total Devices : 2
 Persistence : Superblock is persistent

 Update Time : Tue Oct 11 22:03:50 2016
 State : clean, degraded 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : jostein1:1 (local to host jostein1)
 UUID : 94d56562:90a999e8:601741c0:55d8c83f
 Events : 897492

 Number Major Minor RaidDevice State
 0 0 0 0 removed
 1 8 50 1 active sync /dev/sdd2
 3 8 66 2 active sync /dev/sde2

EDIT2:

The event count for the device no longer part of the array is very different from the others:

# mdadm --examine /dev/sd[bde]1 | egrep 'Event|/dev/sd'
/dev/sdb1:
 Events : 603
/dev/sdd1:
 Events : 374272
/dev/sde1:
 Events : 374272

Smartmontools for the disk that is not part of the array:

# smartctl -d ata -a /dev/sdb
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.0-36-generic] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ2185619
LU WWN Device Id: 5 0014ee 25c58f89e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Oct 12 18:54:30 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
 was completed without error.
 Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
 without error or no self-test has ever 
 been run.
Total time to complete Offline 
data collection: (51480) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
 Auto Offline data collection on/off support.
 Suspend Offline collection upon new
 command.
 Offline surface scan supported.
 Self-test supported.
 Conveyance Self-test supported.
 Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
 power-saving mode.
 Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
 General Purpose Logging supported.
Short self-test routine 
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 494) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
 SCT Feature Control supported.
 SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
 3 Spin_Up_Time 0x0027 147 144 021 Pre-fail Always - 9641
 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1398
 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7788
 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1145
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 45
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 309782
194 Temperature_Celsius 0x0022 124 103 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
 1 0 0 Not_testing
 2 0 0 Not_testing
 3 0 0 Not_testing
 4 0 0 Not_testing
 5 0 0 Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

raid mdadm raid5

edited Oct 12 '16 at 16:55

asked Oct 11 '16 at 18:40

josteinb

1115

edited Oct 12 '16 at 16:55

asked Oct 11 '16 at 18:40

josteinb

1115

edited Oct 12 '16 at 16:55

asked Oct 11 '16 at 18:40

josteinb

1115

asked Oct 11 '16 at 18:40

josteinb

1115

asked Oct 11 '16 at 18:40

josteinb

1115

check the logfiles

– Ipor Sircer
Oct 11 '16 at 18:48

There are no errors in dmesg. Any other logs I should check?

– josteinb
Oct 11 '16 at 19:05

mdadm --examine /dev/sdb2

– Ipor Sircer
Oct 11 '16 at 19:08

Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?

– Zoredache
Oct 11 '16 at 22:30

@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.

– josteinb
Oct 12 '16 at 16:58

add a comment |

check the logfiles

– Ipor Sircer
Oct 11 '16 at 18:48

There are no errors in dmesg. Any other logs I should check?

– josteinb
Oct 11 '16 at 19:05

mdadm --examine /dev/sdb2

– Ipor Sircer
Oct 11 '16 at 19:08

Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?

– Zoredache
Oct 11 '16 at 22:30

@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.

– josteinb
Oct 12 '16 at 16:58

check the logfiles

– Ipor Sircer
Oct 11 '16 at 18:48

There are no errors in dmesg. Any other logs I should check?

– josteinb
Oct 11 '16 at 19:05

mdadm --examine /dev/sdb2

– Ipor Sircer
Oct 11 '16 at 19:08

Look at the SMART data for the failed drive? smartctl. See if it is reporting as failed?

– Zoredache
Oct 11 '16 at 22:30

@Zoredache smartctl shows the drive is fine, as far as I can tell, see output as comment to the question. The output is very similar to other disks of the same type that work just fine.

– josteinb
Oct 12 '16 at 16:58

add a comment |

2 Answers
2

active

oldest

votes

Your mdstat file says it all.

[3/2] [_UU] means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU says the same.

For grater details on the raid devices (before going to the physical ones) you'd run (as root)

mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2

On my system (using raid6) I have simulated a failure and this is an example output:

dev/md0:
 Version : 1.2
 Creation Time : Thu Sep 29 09:51:41 2016
 Raid Level : raid6
 Array Size : 16764928 (15.99 GiB 17.17 GB)
 Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
 Raid Devices : 4
 Total Devices : 5
 Persistence : Superblock is persistent

 Update Time : Thu Oct 11 13:06:50 2016
 State : clean <<== CLEAN!
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : ubuntu:0 (local to host ubuntu)
 UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
 Events : 43

 Number Major Minor RaidDevice State
 4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
 1 8 17 1 active sync /dev/sdb1
 2 8 33 2 active sync /dev/sdc1
 3 8 49 3 active sync /dev/sdd1

 0 8 1 - faulty /dev/sda1 <<== SW-REPLACED

answered Oct 11 '16 at 20:02

EnzoR

1979

So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

– josteinb
Oct 11 '16 at 20:07

@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

– EnzoR
Oct 12 '16 at 7:29

@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

– EnzoR
Oct 12 '16 at 8:12

I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

– josteinb
Oct 12 '16 at 16:52

add a comment |

md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).

If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.

mdadm --re-add /dev/sdb[?] /dev/md1

answered Oct 11 '16 at 20:48

Brian Tillman

56325

I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

– josteinb
Oct 12 '16 at 16:50

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f808451%2flinux-software-raid-is-a-disk-faulty%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Your mdstat file says it all.

[3/2] [_UU] means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU says the same.

For grater details on the raid devices (before going to the physical ones) you'd run (as root)

mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2

On my system (using raid6) I have simulated a failure and this is an example output:

dev/md0:
 Version : 1.2
 Creation Time : Thu Sep 29 09:51:41 2016
 Raid Level : raid6
 Array Size : 16764928 (15.99 GiB 17.17 GB)
 Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
 Raid Devices : 4
 Total Devices : 5
 Persistence : Superblock is persistent

 Update Time : Thu Oct 11 13:06:50 2016
 State : clean <<== CLEAN!
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : ubuntu:0 (local to host ubuntu)
 UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
 Events : 43

 Number Major Minor RaidDevice State
 4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
 1 8 17 1 active sync /dev/sdb1
 2 8 33 2 active sync /dev/sdc1
 3 8 49 3 active sync /dev/sdd1

 0 8 1 - faulty /dev/sda1 <<== SW-REPLACED

answered Oct 11 '16 at 20:02

EnzoR

1979

So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

– josteinb
Oct 11 '16 at 20:07

@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

– EnzoR
Oct 12 '16 at 7:29

@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

– EnzoR
Oct 12 '16 at 8:12

I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

– josteinb
Oct 12 '16 at 16:52

add a comment |

Your mdstat file says it all.

[3/2] [_UU] means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU says the same.

For grater details on the raid devices (before going to the physical ones) you'd run (as root)

mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2

On my system (using raid6) I have simulated a failure and this is an example output:

dev/md0:
 Version : 1.2
 Creation Time : Thu Sep 29 09:51:41 2016
 Raid Level : raid6
 Array Size : 16764928 (15.99 GiB 17.17 GB)
 Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
 Raid Devices : 4
 Total Devices : 5
 Persistence : Superblock is persistent

 Update Time : Thu Oct 11 13:06:50 2016
 State : clean <<== CLEAN!
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : ubuntu:0 (local to host ubuntu)
 UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
 Events : 43

 Number Major Minor RaidDevice State
 4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
 1 8 17 1 active sync /dev/sdb1
 2 8 33 2 active sync /dev/sdc1
 3 8 49 3 active sync /dev/sdd1

 0 8 1 - faulty /dev/sda1 <<== SW-REPLACED

answered Oct 11 '16 at 20:02

EnzoR

1979

So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

– josteinb
Oct 11 '16 at 20:07

@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

– EnzoR
Oct 12 '16 at 7:29

@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

– EnzoR
Oct 12 '16 at 8:12

I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

– josteinb
Oct 12 '16 at 16:52

add a comment |

Your mdstat file says it all.

[3/2] [_UU] means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU says the same.

For grater details on the raid devices (before going to the physical ones) you'd run (as root)

mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2

On my system (using raid6) I have simulated a failure and this is an example output:

dev/md0:
 Version : 1.2
 Creation Time : Thu Sep 29 09:51:41 2016
 Raid Level : raid6
 Array Size : 16764928 (15.99 GiB 17.17 GB)
 Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
 Raid Devices : 4
 Total Devices : 5
 Persistence : Superblock is persistent

 Update Time : Thu Oct 11 13:06:50 2016
 State : clean <<== CLEAN!
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : ubuntu:0 (local to host ubuntu)
 UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
 Events : 43

 Number Major Minor RaidDevice State
 4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
 1 8 17 1 active sync /dev/sdb1
 2 8 33 2 active sync /dev/sdc1
 3 8 49 3 active sync /dev/sdd1

 0 8 1 - faulty /dev/sda1 <<== SW-REPLACED

answered Oct 11 '16 at 20:02

EnzoR

1979

Your mdstat file says it all.

[3/2] [_UU] means that while there are 3 defined physical volumes, only 2 are in use at the moment. Similarly the _UU says the same.

For grater details on the raid devices (before going to the physical ones) you'd run (as root)

mdadm --detail --verbose /dev/md0
mdadm --detail --verbose /dev/md1
mdadm --detail --verbose /dev/md2

On my system (using raid6) I have simulated a failure and this is an example output:

dev/md0:
 Version : 1.2
 Creation Time : Thu Sep 29 09:51:41 2016
 Raid Level : raid6
 Array Size : 16764928 (15.99 GiB 17.17 GB)
 Used Dev Size : 8382464 (7.99 GiB 8.58 GB)
 Raid Devices : 4
 Total Devices : 5
 Persistence : Superblock is persistent

 Update Time : Thu Oct 11 13:06:50 2016
 State : clean <<== CLEAN!
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : ubuntu:0 (local to host ubuntu)
 UUID : 3837ba75:eaecb6be:8ceb4539:e5d69538
 Events : 43

 Number Major Minor RaidDevice State
 4 8 65 0 active sync /dev/sde1 <<== NEW ENTRY
 1 8 17 1 active sync /dev/sdb1
 2 8 33 2 active sync /dev/sdc1
 3 8 49 3 active sync /dev/sdd1

 0 8 1 - faulty /dev/sda1 <<== SW-REPLACED

answered Oct 11 '16 at 20:02

EnzoR

1979

answered Oct 11 '16 at 20:02

EnzoR

1979

answered Oct 11 '16 at 20:02

EnzoR

1979

answered Oct 11 '16 at 20:02

EnzoR

1979

So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

– josteinb
Oct 11 '16 at 20:07

@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

– EnzoR
Oct 12 '16 at 7:29

@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

– EnzoR
Oct 12 '16 at 8:12

I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

– josteinb
Oct 12 '16 at 16:52

add a comment |

So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

– josteinb
Oct 11 '16 at 20:07

@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

– EnzoR
Oct 12 '16 at 7:29

@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

– EnzoR
Oct 12 '16 at 8:12

I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

– josteinb
Oct 12 '16 at 16:52

So I have "state: clean, degraded" in the middle and "removed" for the listing of disk number 0 (which I assume should have been sdb). Can I recover from "removed" somehow? The full output has been included in the question.

– josteinb
Oct 11 '16 at 20:07

@josteinb You'd add spare devices so the RAID can be automatically reconstruced. This is my current setup, as I showed in the answer.

– EnzoR
Oct 12 '16 at 7:29

@josteinb By the way, I prefer to 1) create a RAID device over entire disks, 2) then partition (and 3) then LVM eveything). You first did partitioning, then the RAID. My starting point is: does that make any sense to you? When a drive is gone, then all partitions are (supposed to be) gone as well.

– EnzoR
Oct 12 '16 at 8:12

I see your point about raid first, partitioning second. It is a long time since I created this setup, but I would think the reason why I partitioned first is that I have one raid0 and two raid5 partitions.

– josteinb
Oct 12 '16 at 16:52

add a comment |

md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).

If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.

mdadm --re-add /dev/sdb[?] /dev/md1

answered Oct 11 '16 at 20:48

Brian Tillman

56325

I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

– josteinb
Oct 12 '16 at 16:50

add a comment |

md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).

If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.

mdadm --re-add /dev/sdb[?] /dev/md1

answered Oct 11 '16 at 20:48

Brian Tillman

56325

I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

– josteinb
Oct 12 '16 at 16:50

add a comment |

md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).

If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.

mdadm --re-add /dev/sdb[?] /dev/md1

answered Oct 11 '16 at 20:48

Brian Tillman

56325

md1 and md2 are raid5 arrays, degraded because their respective partitions on /dev/sdb are failed or marked falty. Run mdadm --examine on the array itself for more details (madam --examine /dev/md1).

If all is well with /dev/sdb, re-add the partitions to the arrays. Get the correct partition numbers from your /etc/mdadm.conf or the output of --examine on the array.

mdadm --re-add /dev/sdb[?] /dev/md1

answered Oct 11 '16 at 20:48

Brian Tillman

56325

answered Oct 11 '16 at 20:48

Brian Tillman

56325

answered Oct 11 '16 at 20:48

Brian Tillman

56325

answered Oct 11 '16 at 20:48

Brian Tillman

56325

I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

– josteinb
Oct 12 '16 at 16:50

add a comment |

I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

– josteinb
Oct 12 '16 at 16:50

I tried this, --re-add does not work. mdadm just responded that the device could not be re-added. It may be related to the event counts shown in EDIT2 of the question.

– josteinb
Oct 12 '16 at 16:50

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Server Fault!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

TL8zun9NkT,J,HDbNTK4scsqwv4h9v,kNiXI LNGCHYWHUFFnWxr

搜尋此網誌

Otdfbt

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O﻿ / ﻿43.24775, -8.60070

2 Answers
2

2 Answers
2

2 Answers
2

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070