SVM - Submirrors - One in Needs maintenance & another in Last Erred


SVM - Submirrors - One in Needs maintenance & another in Last Erred




Main Point to Note:


Always replace components in the “Maintenance” state first, followed by those in the “Last Erred” state. After a component is replaced and resynchronized, use the metastat command to verify its state. Then, validate the data.

Theory from Oracle:


When a component in a RAID-1 or RAID-5 volume experiences errors, Solaris Volume Manager puts the component in the “Maintenance” state. No further reads or writes are performed to a component in the “Maintenance” state.
Sometimes a component goes into a “Last Erred” state. For a RAID-1 volume, this usually occurs with a one-sided mirror. The volume experiences errors. However, there are no redundant components to read from. For a RAID-5 volume this occurs after one component goes into “Maintenance” state, and another component fails. The second component to fail goes into the “Last Erred” state.
When either a RAID-1 volume or a RAID-5 volume has a component in the “Last Erred” state, I/O is still attempted to the component marked “Last Erred.” This I/O attempt occurs because a “Last Erred” component contains the last good copy of data from Solaris Volume Manager's point of view. With a component in the “Last Erred” state, the volume behaves like a normal device (disk) and returns I/O errors to an application. Usually, at this point, some data has been lost.
The subsequent errors on other components in the same volume are handled differently, depending on the type of volume.

RAID-1 Volume

A RAID-1 volume might be able to tolerate many components in the “Maintenance” state and still be read from and written to. If components are in the “Maintenance” state, no data has been lost. You can safely replace or enable the components in any order. If a component is in the “Last Erred” state, you cannot replace it until you first replace the components in the “Maintenance” state. Replacing or enabling a component in the “Last Erred” state usually means that some data has been lost. Be sure to validate the data on the mirror after you repair it.
RAID-5 Volume

A RAID-5 volume can tolerate a single component in the “Maintenance” state. You can safely replace a single component in the “Maintenance” state without losing data. If an error on another component occurs, it is put into the “Last Erred” state. At this point, the RAID-5 volume is a read-only device. You need to perform some type of error recovery so that the state of the RAID-5 volume is stable and the possibility of data loss is reduced. If a RAID-5 volume reaches a “Last Erred” state, there is a good chance it has lost data. Be sure to validate the data on the RAID-5 volume after you repair it.


Actual Issue

After the Disk Replacement, One of the mirror devices has got both of its sub mirrors in the maintenance mode.
Point to note is One submirror is in Needs maintenance & another in Last Erred.

 metastat d3
d3: Mirror
    Submirror 0: d13
      State: Needs maintenance
    Submirror 1: d23
      State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 10247232 blocks

d13: Submirror of d3
    State: Needs maintenance
    Invoke: metasync d3
    Size: 10247232 blocks
    Stripe 0:
        Device     Start Block  Dbase State        Hot Spare
        c0t0d0s3          0     No    Okay


d23: Submirror of d3
    State: Needs maintenance
    Invoke: after replacing "Maintenance" components:
                metareplace d3 c0t1d0s3 <new device>
    Size: 10247232 blocks
    Stripe 0:
        Device     Start Block  Dbase State        Hot Spare
        c0t1d0s3          0     No    Last Erred


Try:

#metasync d3

Submirror d13 tries to sync with the device d23, If its successful, no further troubleshooting is required. Check the State of both the submirrors with #metastat command and you are good.

>> If Above step is not successful;

The issue is not necessarily related with the submirror but it could be related to the disk on which the Submirror d23 is sourced. d13 is simply try to sync to d23. The Main Culprit in this case is d23 which needs attention. 

>> Check for read errors in /var/adm/messages on the disk c0t1d0

>> Check for the errors from iostat -En output;

# iostat -En

c0t1d0          Soft Errors: 14 Hard Errors: 6850 Transport Errors: 7189
Vendor: SEAGATE  Product: ST373207LSUN72G  Revision: 045A Serial No: 3432BEKC
Size: 73.40GB <73400057856 bytes>
Media Error: 5707 Device Not Ready: 0 No Device: 1142 Recoverable: 0
Illegal Request: 14 Predictive Failure Analysis: 736

Check if the below steps could solve the issue.


# prtvtoc /dev/rdsk/c0t1d0s3
* /dev/rdsk/c0t1d0s3 partition map
*
* Dimensions:
*     512 bytes/sector
*     424 sectors/track
*      24 tracks/cylinder
*   10176 sectors/cylinder
*   14089 cylinders
*   14087 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00          0   4100928   4100927
       1      3    01    4100928   4100928   8201855
       2      5    00          0 143349312 143349311
       3      7    00    8201856  10247232  18449087
       4      0    00   18449088  10247232  28696319
       5      4    00   28696320  20484288  49180607
       6      0    00   49180608  94148352 143328959
       7      0    00  143328960     20352 143349311


>> SLICE 3: Starting sector 8201856 18449087

>>format>analyze>read


analyze> read
Ready to analyze (won't harm SunOS). This takes a long time,
but is interruptable with CTRL-C. Continue? y

        pass 0
Medium error during read: block 14559059 (0xde2753) (1430/17/171)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559059 (1430/17/171)...ok.

Medium error during read: block 14559060 (0xde2754) (1430/17/172)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559060 (1430/17/172)...ok.

Medium error during read: block 14559061 (0xde2755) (1430/17/173)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559061 (1430/17/173)...ok.

Medium error during read: block 14559062 (0xde2756) (1430/17/174)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559062 (1430/17/174)...ok.

Medium error during read: block 14559063 (0xde2757) (1430/17/175)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559063 (1430/17/175)...ok.

Medium error during read: block 14559064 (0xde2758) (1430/17/176)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559064 (1430/17/176)...ok.

Medium error during read: block 14559065 (0xde2759) (1430/17/177)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559065 (1430/17/177)...ok.

Medium error during read: block 14559066 (0xde275a) (1430/17/178)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559066 (1430/17/178)...ok.

Medium error during read: block 14559067 (0xde275b) (1430/17/179)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559067 (1430/17/179)...ok.

Medium error during read: block 14559068 (0xde275c) (1430/17/180)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559068 (1430/17/180)...ok.

Medium error during read: block 14559069 (0xde275d) (1430/17/181)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559069 (1430/17/181)...ok.

Medium error during read: block 14559070 (0xde275e) (1430/17/182)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559070 (1430/17/182)...ok.

Medium error during read: block 14559071 (0xde275f) (1430/17/183)
ASC: 0x11   ASCQ: 0x0
Repairing hard error on 14559071 (1430/17/183)...ok.

   14086/23/304

        pass 1
   14086/23/304

Total of 13 defective blocks repaired.


REMEMBER THE FIRST INSTRUCTION; DO SYNC THE DEVICE IN MAINTENANCE MODE FIRST NOT THE LAST ERRD ONE.

# metasync d3


LET THE WHOLE RESYNC FOR D13 GET COMPLETED. DO NOT INTERRUPT.

CHECK;

#metastat d3  if the device d13 is OKAY

Now complete the other last errd device sync.

#metareplace -e d3 c0t1d0s3

# metastat d3

BOTH THE DEVICES OKAY? YOU ARE GOOD. LEAVE YOUR COMMENTS BELOW;

CHECK THE OTHER REFERENCE BLOGS:

http://1shiftg.blogspot.com.au/2012/08/solaris-10-svm-mirrors-maintenence-last.html

http://tad1982.blogspot.com.au/2011/05/both-submirrors-in-needs-maintenance.html





Comments

Popular posts from this blog

Solaris SVM - Replacing a Disk.

Solaris 10 - Powerpath Upgrade Instructions.