Understanding Azure storage redundancy offerings

Art_Khlobystin · ‎Jun 15 2020

Before we go deeper into the storage redundancy space it'd be helpful to better understand the building blocks of Azure global infrastructure as well as a few terms commonly used in high availability and disaster recovery in general.

Data residency boundary.PNG

Geography - a discrete market, typically containing 2+ regions, that preserves data residency and compliance boundaries.
Azure region - a set of datacenters deployed within a latency-defined perimeter and connected through a dedicated regional low-latency network.
Region pair - each Azure region is paired with another region within the same geography.
Availability zone - a physically separate location within an Azure region. Each AZ is made up of 1+ datacenters with independent power, cooling, and networking.
Availability - defined by Gartner as the the assurance that an enterprise’s IT infrastructure has suitable recoverability and protection from system failures, natural disasters or malicious attacks. High availability refers to a system that is operational without interruption for long periods of time by using redundant or fault-tolerant components and is typically measured as a percentage.
Recovery point objective (RPO) - the amount of data which can be lost while bringing the system back online after a critical failure, i.e. the point in time to which the data can be recovered.
Recovery time objective (RTO) - the amount of time that it takes to get the system back online after a critical failure, i.e. how long you can sustain a service interruption before you absolutely need to be back online.

With these in mind let's take a closer look at what Azure storage redundancy options have to offer.

Locally redundant storage (LRS)

Data is synchronously replicated 3 times within a single storage cluster, in a single data center in a region = can only sustain node failure within the storage cluster.
Provides at least 11 9s of durability and 99.9% of availability (reads & writes) for hot tier and 99% for cool.

Zone-redundant storage (ZRS)

Data is synchronously replicated 3 times across 3 availability zones in a region = can sustain node failure within the storage cluster or entire datacenter or availability zone going down.
Provides at least 12 9s of durability and 99.9% of availability (reads & writes) for hot tier and 99% for cool.

Geo-redundant storage (GRS)

Data is synchronously replicated 3 times within a single storage cluster in the primary region, then asynchronously replicated to the secondary paired region (3 more copies) = can sustain node failure within the storage cluster, entire datacenter or availability zone going down or a region-wide outage (DC/zone/region failure would require account failover to restore read and write availability - https://aka.ms/accountfailover).
Provides at least 16 9s of durability and 99.9% of availability (reads & writes) for hot tier and 99% for cool + 99.99% on reads for RA-GRS (read-access to the secondary endpoint).
Typically has an RPO of less than 15 minutes (no SLA).
Read access to the secondary is available if the primary region is down with RA-GRS.

Geo-zone-redundant storage (GZRS)

Data is synchronously replicated 3 times across 3 availability zones in the primary region, then asynchronously replicated to the secondary paired region (3 more copies) = can sustain node failure within the storage cluster, entire datacenter or availability zone going down or a region-wide outage (only region failure would require account failover to restore read and write availability - https://aka.ms/accountfailover).
Provides at least 16 9s of durability and 99.9% of availability (reads & writes) for hot tier and 99% for cool + 99.99% on reads for RA-GRS (read-access to the secondary endpoint).
Typically has an RPO of less than 15 minutes (no SLA).
Read access to the secondary is available if the primary region is down with RA-GZRS.

Account failover

Account failover.PNG

Allows you to initiate the failover at the account level in case of an ongoing/upcoming disaster (certain restrictions apply - account failover considerations).
Generally available in all public regions.
Failover is disruptive and converts the account to LRS.
Typically has an RTO of less 1 hour (no SLA).

Failover timeline.PNG Be aware of potential data loss! Always check LastSyncTime before executing the failover.

Hope this helps you get a good grasp of durability and availability options for your storage needs! For more details please refer to our documentation:

We'd love to hear from you - please reach us out via email at azurestoragefeedback@microsoft.com and/or post to Azure storage feedback forum.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Understanding Azure storage redundancy offerings