Wednesday, 6 July 2022

AWS Amazon DocumentDB Theory

Amazon DocumentDB : 

While offering a MongoDB-compatible API, DocumentDB is not running MongoDB software, but “Amazon DocumentDB emulates the responses that a client expects from a MongoDB server by implementing the Apache 2.0 open source MongoDB 3.6 API” on top of an undisclosed storage engine. From some information, it looks like it is built on top of the Aurora storage subsystem that is also used by both Aurora MySQL and Aurora PostgreSQL. In fact the following features/limitations are common to both DocumentDB and Aurora:
  • both replicate six copies of data across three AWS Availability Zones
  • both have cluster size limit of 64 TB
  • both do not allow null characters (‘\0’ ) in strings
  • identifiers are limited to 63 letters for both
  • both persist a write-ahead log when writing
  • both don’t need to write full buffer page syncs.

High Availability
Fig. 1: DocumentDB availability
An Amazon DocumentDB cluster consists of two components:
  • Cluster volume: cluster has exactly one cluster volume, which can store up to 64 TB of data.
  • Instances: provide the processing power for the database, writing data to, and reading data from, the cluster storage volume. An Amazon DocumentDB cluster can have 0–16 instances:
  •  – Primary instance: supports read and write operations and performs all data modifications to the cluster volume. Each Amazon DocumentDB cluster has one primary instance.
  •  – Replica instance: supports only read operations. An Amazon DocumentDB cluster can have up to 15 replicas in addition to the primary instance.
Fig. 2: Deployment scenario

If the primary instance fails, an Amazon DocumentDB replica is promoted to the primary instance. There is a brief interruption during which read and write requests made to the primary instance fail with an exception. Amazon estimates this interruption is less than 120 seconds.
You can customise the order in which replicas are promoted to the primary instance after a failure by assigning each replica a priority, note that it is strongly suggested that replicas should be of the same instance class as the primary. It is also really important to create at least one or more Amazon DocumentDB replicas in two or more different Availability Zones, in this way your datastore can survive a zone failure.

Scalability & Replication

By placing replica instances in separate Availability Zones, it is possible to scale reads and increase cluster availability.

Compute and storage scale independently. It is possible to scale reads by deploying additional replicas. Scalability and storage are scalable up-to 64TB. DocumentDB automatically adds 10GB whenever it reaches capacity.

DocumentDB is also able to automatically fail over to a read replica in the event of a failure–typically in less than 30 seconds. Currently Amazon DocumentDB doesn’t support any kind of multi-region setup.

Amazon DocumentDB does not rely on replicating data to multiple instances to achieve durability, data is durable whether it contains a single instance or 15 instances.
All writes are processed by the primary instance that executes a durable write to the cluster volume. It then replicates the state of that write (not the data) to each active replica. Writes to an Amazon DocumentDB cluster are atomic within a single document.

Consistency

Reads from Amazon DocumentDB replicas are eventually consistent with minimal replica lag (AWS says usually less than 100 milliseconds) after the primary instance writes the data:

  • reads from an Amazon DocumentDB cluster’s primary instance have read-after-write consistency
  • reads from a read replica have eventual consistency

It is possible to modify the read consistency level by specifying the read preference for the request or connection (it supports all MongoDB read preferences):

  • primary: reads are always routed to the primary instance
  • primaryPreferred: routes reads to the primary instance under normal operation, in case of failover a replica is used
  • secondary: reads are only routed to a replica, never the primary instance
  • secondaryPreferred: reads are routed to a read replica when one or more replicas are active. If there are no active replica instances in a cluster, the read request is routed to the primary instance
  • nearest: read preference routes reads based solely on the measured latency between the client and all instances in the Amazon DocumentDB cluster

Operations

It is possible to create an AWS DocumentDB cluster using CloudFormation stack (as described here).

Amazon DocumentDB is a fully managed solution that provides the following features:

  • auto scaling storage (up to 64 TB in 10GB increments)
  • simple compute resource scaling (resources allocated to an instance can be modified by changing instance class)
  • built-in monitoring, fault detection, and failover
  • daily snapshots.
  • An Amazon DocumentDB cluster decouples storage and compute.
  • A cluster consists of Cluster volume and Instances
    • Cluster volume refers to the storage layer that spans multiple Availability Zones. Each Availability Zone has a copy of the cluster data.
    • Instances refers to the compute layer. It provides the processing power needed for the database to write data to, and read data from, the cluster volume. 
  • Amazon DocumentDB Endpoints
    • Cluster endpoint
      • Connects to cluster’s current primary instance.
      • Can be used for both read and write operations.
    • Reader endpoint
      • Connects to one of the available replicas of the cluster.
      • Use for read operations only.
      • If the cluster has more than one replica, the reader endpoint will direct each request to DocumentDB replicas.
    • Instance endpoint
      • Connects to a specific instance in the cluster.
      • Use for specialized workloads that will only affect specific replica instances.

Performance

  • Provides millions of requests per second with millisecond latency and has twice the throughput of MongoDb.

Scaling

  • The minimum storage is 10GB. The Amazon DocumentDB storage will automatically scale up to 64 TB in 10 GB increments without affecting performance.
  • The Amazon DocumentDB cluster can be scaled by modifying the instance class for each instance in the cluster.
  • You can create up to 15 Amazon DocumentDB replicas in the cluster.
  • The replication lag is usually less than 100 milliseconds after the primary instance has written an update.

Reliability

  • The cluster volume provides durability by maintaining six copies of all data across three Availability Zones.
  • Amazon DocumentDB uses asynchronous replication to update the changes made to the primary instance to all of DocumentDB’s read replicas.
  • In most cases, the DocumentDB’s restart time is less than a minute after a database crash.
  • DocumentDB replicas can act as a failover target with no data loss.
  • Supports automatic failover.
  • Supports promotion priority within a cluster. Amazon DocumentDB will promote the replica with the highest priority tier to primary when the primary instance fails.
  • To increase the cluster’s availability, create replicas in multiple Availability Zones. The Amazon DocumentDB will automatically include the replicas when selecting for a failover target in the event of an instance failure.

Backup And Restore

 

Cluster Volume

Local Storage

STORED DATA TYPE

Persistent data

Temporary data

SCALABILITY

Automatically scales out when more space is required

Limited to the DB Instance class

  • Automated backups are always enabled.
  • Supports Point-In-Time restoration, which can be up to 5 minutes in the past.
  • You can restore from a cluster snapshot.
  • Supports sharing of encrypted manual snapshots.
  • Supports cross-region snapshot copying.

Security

  • You can authenticate a connection to a DocumentDB database through standard MongoDb tools with Salted Challenge Response Authentication Mechanism (SCRAM).
  • You can authenticate and authorize the use of DocumentDB management APIs through the use of IAM users, roles, and policies.
  • Data in transit is encrypted using Transport Layer Security (TLS).
  • Data at rest is encrypted using keys you manage through AWS KMS.
  • Amazon DocumentDB supports role based access control ( RBAC ) with built-in roles to enforce the principle of least privileged access.

Pricing

  • You are billed based on four categories
    • On-demand instances
      • Pricing per second with a 10-minute minimum
    • Database I/O
      • Pricing per million I/Os
    • Database Storage
      • Pricing per GB/month
    • Backup Storage
      • Pricing per GB/month

No comments:

Post a Comment