How do I repair a degraded or crashed SSD cache in an HA cluster?
Applicable Products
- QuTS hero h5.3.0 or later
- High Availability Manager
- Storage Manager
Scenario
While using QNAP NAS with high availability (HA) enable, I received a system warning indicating an SSD cache issue. One or more cache disks are reported as degraded or crashed. How can I repair or replace the faulty disks without interrupting ongoing services?
Procedure
Before you begin, please do the following:
- Back up important data.
Always back up your data before performing disk operations to mitigate the risk of data loss. - Identify the faulty SSD(s).
- Go to Storage Manager > Disks > Disk.
- Identify the SSDs with “Cache” listed under the column header Usage Type.
- Identify the SSDs with “Error” or “Warning” listed under the column header Status.
- Determine whether hot-swapping is supported for the drive bays/slots where your faulty SSDs are installed.Tip
- 3.5-inch or 2.5-inch drive bays usually support hot-swapping.
- Drive slots that require access to the system board (such as M.2) usually do not support hot-swapping.
- To determine hot-swapping support for drive bays/slots on your specific model, download and check the hardware user guide for your model in Download Center.
- Follow the relevant instructions depending on whether hot-swapping is supported:
Case 1: Hot-swapping supported
- Replace the faulty disk.
- Remove the faulty disk.
- Install a healthy disk of the same or larger capacity in the same slot.
The system automatically detects the new disk.
- If the SSD cache is degraded, the system will automatically start rebuilding the cache RAID.Note
If the system does not automatically start rebuilding, try setting the new disk as a spare disk to trigger the rebuild process.
- If the SSD cache crashed, manually remove and recreate the SSD cache.
- Go to Storage Manager > Cache Acceleration.
- Remove the SSD cache.
For details, see Removing the SSD cache.
The system automatically flushes cached data back to the storage pool. - Recreate the SSD cache.
For details, see Creating the SSD cache.
Case 2: Hot-swapping not supported
- If the faulty disk is on the active node, switch the node role to passive node by performing a switchover.
If the faulty disk is on the passive node, skip to the next step.- Go to High Availability Manager > Cluster.
- Click Manage, and then select Switch Over.
The original active node becomes the passive node.
- Shut down the passive node.
- Go to High Availability Manager > Nodes.
- Identify the passive node.
- Click
, and then select Shut Down.
- Replace the faulty disk.
- Remove the faulty disk.
- Install a healthy disk of the same or larger capacity in the same slot.
- Power on the passive node.
After the passive node starts, it automatically rejoins the HA cluster. - Switch the passive node’s role to active node.
SSD cache is managed by the active node. To repair the SSD cache, the host NAS must be in the active node role.- Go to High Availability Manager > Cluster.
- Click Manage, and then select Switch Over.
The passive node is now the active node.
- If the SSD cache is degraded, the system will automatically start rebuilding the cache RAID.Note
If the system does not automatically start rebuilding, try setting the new disk as a spare disk to trigger the rebuild process.
- If the SSD cache crashed, manually remove and recreate the SSD cache.
- Go to Storage Manager > Cache Acceleration.
- Remove the SSD cache.
For details, see Removing the SSD cache.
The system automatically flushes cached data back to the storage pool. - Recreate the SSD cache.
For details, see Creating the SSD cache.
After replacing a disk, we recommended performing a switchover to ensure that the switchover/failover mechanism works properly and that HA functionality has been restored.