Why Enterprises Need a QuTS MEGA Scale-out Solution
The common challenges include sustained data growth, zero tolerance for service disruption, and the need for predictable data protection and operational manageability.
Regulation-driven long-term data retention
Transaction records, call recordings, and audit data must be retained long term without risk of loss. With high availability and robust data protection mechanisms, capacity can be continuously expanded without service interruption.
Massive, continuously growing research data
Genomic, imaging, and research datasets continue to grow. With high-efficiency data protection and automated self-healing mechanisms, the platform provides long-term stability to support analytics and research workloads.
High-volume, long-term image data retention
Process images and surveillance recordings accumulate rapidly. The Scale-out architecture expands alongside production growth, while automatic rebalancing prevents performance and management bottlenecks.
Comprehensive Capabilities for Diverse Storage Needs
Built on a unified scale-out architecture, QuTS MEGA integrates file and object services with mainstream protocol support, enabling enterprises to scale as data grows.
One Platform Covering Services, Protocols, and Scalability
Designed for enterprise-grade availability, with clear capabilities and deployment specifications for POC and production.
Storage Types
A unified platform for multiple data formats
-
File Storage Ideal for shared folders, departmental collaboration, and image/file archiving scenarios.
-
Object Storage Designed for long-term retention, application integration, and S3 API connectivity.
Protocols
Compatible with enterprise applications and access methods
-
SMBCommonly used file-sharing protocol in Windows and Active Directory environments.
-
NFSWidely adopted file service protocol in Linux and R&D environments.
-
S3 APIStandard object storage interface for application integration and data lake architectures.
Scalable Architecture
A clear path from initial deployment to PB-scale growth
-
3–96 Node Scale-out Start with 3 nodes and scale up to 96 nodes, achieving PB-scale storage with high availability.
-
Non-disruptive expansion Nodes can be added as needed, with automatic rebalancing and built-in data protection.
※ Actual capacity and performance may vary depending on cluster size, service configuration, and data protection policies (such as EC or Replication).
Core Capabilities
Built on a Linux and Ceph distributed architecture with high availability, delivering an enterprise-grade storage platform with redundancy, fault tolerance, and scalability.
High Availability
Continuous service operation even during node failures
Data Redundancy
-
Replication Ensures data availability through multiple data copies, ideal for scenarios requiring fast access and high reliability.
-
Erasure Coding Leverages Ceph’s distributed EC mechanism to provide efficient data protection through mathematical algorithms, maintaining fault tolerance while optimizing storage efficiency.
Fault Tolerance
-
Service Distribution Services run across multiple nodes and automatically recover or migrate when a node fails, maintaining external service availability.
-
Self-healing Automatically reconstructs lost data using replicas or parity, preserving data integrity while minimizing manual intervention.
Operational Continuity
-
Dynamic Rebalancing Automatically redistributes data when nodes are added or removed, maintaining redundancy consistency and preventing hotspots to ensure balanced system performance.
-
Rolling Upgrades Performs system upgrades and maintenance without service interruption, ensuring continuous operations and service availability.
-
Data Storage Sustainability Built on a Ceph distributed architecture, enabling linear scaling of capacity and performance from a minimum of 3 nodes up to 96 nodes, supporting long-term enterprise data growth.
Enterprise Security and Compliance
-
Active Directory Integration Integrates with existing enterprise AD environments to provide centralized authentication and unified access control, simplifying user management.
-
Audit Log Records system operations and data access activities, providing a complete audit trail to meet compliance and security analysis requirements.
-
Write Once Read Many(S3 WORM) Immutable object locking mechanism that prevents data modification or deletion, meeting regulatory compliance requirements in industries such as finance and healthcare.
Erasure Coding (EC) Protection Overview
Using EC 4+2 as an example: “4 data fragments” + “2 parity fragments” distributed across 6 nodes, allowing up to 2 nodes to fail simultaneously without data loss.
QuTS MEGA supports multiple EC configurations (such as 8+2, 8+3, etc.), enabling flexible selection of capacity efficiency and protection levels based on requirements.
Visual Explanation: 4 Data + 2 Parity
A file is split into 6 fragments across 6 nodes: 4 data (D1–D4) and 2 parity (P1–P2). Even if 2 nodes fail, the data can still be reconstructed.
Note: This illustrates EC 4+2. Other configurations such as 8+2, 8+3, or 16+4 are available to meet different capacity and protection requirements.
Scenario 1: 2 Node Failures ✔︎ Data Protected
Even if Node 2 and Node 5 fail, the system can reconstruct the complete dataset from the remaining fragments (D1, D3, D4, P2), with no data loss.
Scenario 2: 3 Node Failures ✕ Data Loss
When 3 or more nodes fail simultaneously, the remaining fragments are insufficient to reconstruct the data, which may result in data loss. This exceeds the fault tolerance scope of EC 4+2.
※ This illustrates protection capability at the node-level failure domain. QuTS MEGA supports multiple EC configurations (such as 4+2, 8+2, 8+3, 16+4, etc.), allowing selection based on cluster size, workload characteristics, and protection requirements. Actual read/write availability may depend on cluster settings (e.g., min_size, service-layer HA, and load design).
Service Distribution
Services run across multiple nodes. When a node fails, services automatically recover and migrate to ensure continuous cluster availability.
Automatic Service Recovery Mechanism
Reduce single points of failure and enhance overall system availability
✔︎ Normal State: Services distributed across multiple nodes
⚠ Node 2 Failure → Automatic Service Migration
When Node 2 fails, the S3 and MGR services originally running on it automatically migrate to Node 3 and Node 4,
ensuring uninterrupted service with no user impact.
Automatic Failure Detection
Continuously monitors node health, quickly identifying failures and triggering recovery processes.
Automatic Service Migration
Automatically migrates services from failed nodes to healthy nodes, ensuring uninterrupted service.
Load Distribution
Intelligently distributes services across multiple nodes to prevent overload on a single node and improve overall performance.
Zero Manual Intervention
Fully automated failure recovery reduces operational burden and minimizes the risk of human error.
Ideal for 24×7 operations, high-concurrency workloads, and mission-critical applications requiring high availability. Significantly reduces the business impact of service disruptions while improving user experience.
Self-healing
Automatically detects and reconstructs lost or corrupted data to maintain data integrity and protection status, without manual intervention.
Intelligent Data Recovery Mechanism
Automatically rebuild data using replicas or parity to ensure long-term data integrity
✔︎ Normal State: Data stored with 3 replicas across nodes
⚠ Disk failure detected on Node B, data loss occurs
✔︎ Self-healing: Data automatically rebuilt to a new disk from Node A or Node C
When data loss is detected on Node B, the system automatically copies intact data from Node A or Node C,
restoring the 3-replica protection level without manual intervention and ensuring long-term data reliability.
Continuous Health Monitoring
Regularly scans data integrity and proactively detects corrupted or missing data blocks.
Automatic Data Reconstruction
Rebuilds lost data automatically using replication copies or Erasure Coding parity, restoring full data integrity without manual intervention.
Protection Level Restoration
Automatically restores the original protection level after reconstruction, preventing prolonged degraded states.
Repair Progress Tracking
Reduces operational overhead and human error, making it ideal for long-term data retention, compliance, and mission-critical data protection.
Reduces operational burden and labor costs while minimizing human error risks. Ideal for long-term data retention, regulatory compliance, and mission-critical data protection scenarios, ensuring long-term data reliability.
Dynamic Rebalancing
Automatically redistributes data when nodes are added or removed, maintaining redundancy consistency and preventing storage hotspots.
Intelligent Data Redistribution Mechanism
Ensures balanced cluster resource utilization for optimal performance and capacity efficiency
⚠ Before Adding a Node: Uneven capacity usage across 3 nodes
⚠ Uneven capacity usage — Node 3 is nearing full capacity and may become a performance bottleneck
Rebalancing in Progress: Data automatically migrating to the new node
✔︎ Rebalancing Complete: Balanced capacity across 4 nodes, optimized performance
✔︎ Even capacity distribution (62–66%) prevents hotspots and maintains optimal performanc
After adding Node 4, the system automatically migrates part of the data from Nodes 1–3 to the new node,
balancing capacity usage across all four nodes (approximately 62–66%) and preventing overload on any single node.
Automatic Integration of New Nodes
When a new node joins the cluster, the system automatically migrates part of the data to balance storage utilization.
Data Protection During Node Removal
Before a node is removed, data is automatically migrated to other nodes to ensure no data loss and maintain the configured protection level.
Hotspot Prevention
Automatically detects uneven load and redistributes data to prevent hotspots. Supports data disk suto metadata migration to optimize data and metadata placement during operation.
I/O Performance Priority
Provides Client I/O First and Recovery I/O First scheduling modes to safeguard critical service performance during rebalancing or data recovery operations.
Supports flexible enterprise scalability, allowing nodes to be added gradually as business grows without service disruption. Maintains long-term performance stability, prevents capacity imbalance from degrading performance, and reduces expansion and operational complexity.
Monitoring & Alerting
Enhances operational responsiveness and collaboration through deep hardware monitoring, flexible alerting rules, and broad integration capabilities.
Hardware Monitoring & Diagnostics
Comprehensive Hardware Status Visibility
Monitors system fans, temperatures, and power module status in real time.
Hardware LED & Drive Locate
Quickly identify failed drives with visual hardware indicators.
S.M.A.R.T. Health Monitoring
Continuously tracks disk health to provide early warnings of potential risks.
Alert Notifications & Monitoring Integration
Prometheus + Alertmanager
Supports real-time notifications via Email, SNMP Traps, and Microsoft Teams.
SNMP & Third-Party Monitoring Platforms
Integrates with existing monitoring systems (such as PRTG Network Monitor).
QNAP Service War Room
Centralized visibility of cluster health, alerts, and events, enabling vendor-supported remote monitoring and proactive notification.
Cluster Scale & Node Models
A cluster can be established with as few as 3 nodes and expanded up to 96 nodes. Four node models are available, covering entry-level, large-capacity, high-density, and high-performance workload requirements.
QSN-3000
Entry-level Scale-out Node
6 Cores / 12 Threads
6 × 2.5" SATA
2 × 2.5GbE BASE-T
QSN-3050
High-capacity Node
8 Cores / 16 Threads
6 × 2.5" SATA
2 × 2.5GbE BASE-T
QSN-7530
High-performance Dense Node
12 Cores / 24 Threads
2 × 2.5GbE BASE-T