What is Azure Native Qumulo Scalable File Service for disaster recovery?

Azure Native Qumulo (ANQ) Scalable File Service provides high-performance, exabyte-scale unstructured-data cloud storage for disaster recovery. This article describes the options for deploying an Azure-based disaster recovery solution with Azure Native Qumulo Scalable File Service.

Architecture

Azure Native Qumulo for disaster recovery can be deployed in one or more Azure availability zones depending on the primary site configuration and the level of recoverability required.

In all versions of this solution, your Azure resources are deployed into your own Azure tenant, and the ANQ service instance is deployed in Qumulo’s Azure tenant in the same regions. Your access to the ANQ service instance and its data are enabled through a delegated subnet in your Azure tenant, using virtual network (VNet) injection to connect to the ANQ service instance.

Note

Qumulo has no access to any of your data on any ANQ instance.

Data services are replicated from the primary-site Qumulo instance to the ANQ service instance in two ways:

using Qumulo continuous replication in which all changes on the primary file system are immediately replicated to the ANQ instance, overwriting older versions of the data.
using snapshots with replication that maintain multiple versions of changed files to enable more granular data recovery if you have data loss or corruption.

If you have a primary-site outage, critical client systems and workflows can use the ANQ service instance as the new primary storage platform, and can use the service’s native support for all unstructured-data protocols – SMB, NFS, NFSv4.1, and S3 – just as they were able to do on the primary-site storage.

Solution architecture

The ANQ solution can be deployed in three ways:

On-premises or other cloud
between Azure regions
On-premises or other cloud (multi-region)

ANQ disaster recovery - on-premises or other cloud

In this setup, ANQ for disaster recovery is deployed into a single Azure region, with data replicating from the primary Qumulo storage instance to the ANQ service through your own Azure VPN Gateway or ExpressRoute connection.

ANQ disaster recovery - between Azure regions

In this scenario, two separate Azure regions are each configured as a hot standby/failover site for one another. If you have a service failure in Azure Region A, critical workflows and data are recovered on Azure Region B.

Qumulo replication is configured for both ANQ service instances, each of which serves as the secondary storage target for the other.

ANQ disaster recovery - on-premises or other cloud (multi-region)

In this scenario, the primary Qumulo storage is either on-premises or hosted on another cloud provider. Data on the primary Qumulo cluster is replicated to two separate ANQ service instances in two Azure regions. If you have a primary site failure or region-wide outage on Azure, you have more options for recovering critical services.

Solution workflow

Here's the basic workflow for ANQ for disaster recovery:

Users and workflows access the primary storage solution using standard unstructured data protocols: SMB, NFS, NFSv4.1, S3.
Users and/or workflows add, modify, or delete files on the primary storage instance as part of the normal course of business.
The primary Qumulo storage instance identifies the specific 4-K blocks in the file system that were changed and replicates only the changed blocks to the ANQ instance designated as the secondary storage.
If a continuous replication strategy is used, then any older versions of the changed data on the secondary storage instance are overwritten during the replication process.
If snapshots with replication are used, then a snapshot is taken on the secondary cluster to preserve older versions of the data, with the number of versions determined by the applicable snapshot policy on the secondary cluster.
If you have a service interruption at the primary site that’s sufficiently widespread, or of long enough duration to warrant a failover event, then the ANQ instance that serves as the secondary storage target becomes the primary storage instance. Replication is stopped, and the read-only datasets on the secondary ANQ service instance are enabled for full read and write operations.
Affected users and workflows are redirected to the ANQ instance as the primary storage target, and service resumes.

Thursday, 30 May 2024