Reliability and High Availability of Backup Software Systems
This article examines how backup software ensures enterprise data reliability through media redundancy, server failover, load balancing, and both cold and high‑availability solutions for the management server, highlighting technologies such as GridStor, dual‑array clustering, and deduplication.
Backup software is used to enhance enterprise critical data reliability and redundancy, and the reliability and availability of the backup software itself directly affect enterprise data reliability. Today we explore the system reliability and availability of backup software.
Backup Media Reliability
Backup media typically provide path redundancy (e.g., multi‑path SAN storage), volume mirroring, snapshots, and remote replication to ensure media system reliability. Since the data stored on backup media is already redundant, in case of disk failure only an available replica is needed for recovery; remote replication is usually applied to whole‑media failures but incurs extra storage cost.
Backup media (NAS or SAN devices) can be shared by multiple MAs, improving resource utilization, yet physical faults or hardware errors cannot be avoided. When a storage device fails and no further protection is applied, data loss occurs. Simpana’s GridStor technology strengthens media redundancy and reliability.
Media Server Reliability
Simpana provides GridStor at the media‑server layer, supporting failover and load‑balancing within an MA cluster, thereby increasing data access high‑availability.
When an MA or storage media fails, backup tasks switch to an available MA and media. GridStor also supports cross‑OS and cross‑storage switching; for example, a Windows backup job can automatically fail over to a Linux MA without user intervention, and the system locates the required data during restoration.
Parallel deduplication across MAs enables load balancing and failover while maintaining performance, and the system shares a deduplication fingerprint library. Currently, two MAs can be combined for deduplication, following the workflow below:
First, the client selects an MA (DataMover) to send data, which generates a fingerprint and uses an internal algorithm to choose a partition for fingerprint lookup. If the chosen partition resides on another MA, the lookup is performed over the network on that MA.
If the lookup finds existing data, only a reference is updated in the selected partition and metadata pointers are adjusted; if the data is new, the fingerprint is inserted into the partition and the client‑selected MA backs up the data to the appropriate storage media, while the deduplication fingerprint library remains shared across MAs.
Backup Management Server Reliability
Within the backup software system, the CommServe management server is the core component of the Simpana platform, containing critical configuration, security settings, user information, licenses, and Tier‑1 indexes. Loss of this data makes system reconstruction extremely difficult.
Cold Backup Solution
Simpana supports a CommServe DR option: when the primary CommServe fails, a DR server can start backup task management. This is a cold backup approach; data is not automatically synchronized and must be manually restored to the DR CommServe.
Typically, a CommServe is deployed at both primary and standby sites with identical IP addresses and hostnames; the standby is usually powered off. The standby site provides a file‑share space where the primary’s catalog is periodically backed up.
When the primary CommServe becomes abnormal, the standby CommServe is started, the latest catalog backup set is imported, and the standby then provides task‑management services.
High‑Availability Solution
The DR scheme is a cold backup with lengthy, manual recovery. To achieve automation and high reliability, CommServe can be installed in a cluster environment such as Microsoft MSCS. MSCS clusters are supported in Windows 2012, and Simpana includes an embedded SQL Server database for storing indexes and fingerprints.
CommServe cluster deployment can follow two patterns: dual‑array active‑active and single‑array. The dual‑array approach leverages the array’s active‑active feature, synchronously creating CommServe data on SAN dual‑active volumes (the “Master Server” in the diagram is the CommServe).
In the single‑array model, cluster software guarantees exclusive access and data consistency; separate volumes are created on the SAN and mapped to the primary and standby CommServe. If the primary fails, the cluster switches the workload to the standby.
Dual‑array active‑active deployment prevents both CommServe server and CommServe database (SAN array) failures, while single‑array protects only the server. In cross‑site deployments, MS SQL cluster log latency must meet strict requirements (Microsoft documentation). Microsoft recommends log latency < 1 ms for optimal performance, but in practice < 5 ms is used; when fiber distance exceeds 30 km, DWDM equipment is required.
Related Reading:
>> Backup Software Architecture Analysis
>> Distributed Index Architecture of Backup Software
>> Backup Solution Network Architecture Analysis
>> Key Features of Backup Software
>> Virtual Machine Backup Principles and Architecture
" ICT Architecture Alliance " real‑name directory provides a precise point‑to‑point communication channel for experts. Interested professionals can click “Read Original” to join.
From now on, this account will share technical materials, regularly updating links to the ICT Architecture Alliance directory. The current material "Amazon Cloud Computing Service Analysis" can be downloaded at http://pan.baidu.com/s/1bp3IhJp (copy the link to download).
Friendly Reminder: Please search for "ICT_Architect" or scan the QR code below to follow the public account and get more exciting content.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
