Tuesday, 5 July 2022

Amazon Aurora : Theory

Amazon Aurora

 

  • A fully managed relational database engine that’s compatible with MySQL and PostgreSQL.
  • With some workloads, Aurora can deliver up to five times the throughput of MySQL and up to three times the throughput of PostgreSQL.
  • Aurora includes a high-performance storage subsystem. The underlying storage grows automatically as needed, up to 128 terabytes. The minimum storage is 10GB.

Amazon RDS, released in 2009, offers great promise for developers using MySQL.  For those running and managing instances within AWS cloud, database availability and consistency support have been highly beneficial features. Today it is compatible with Oracle, MSSQL, PostgreSQL and MariaDB. And then comes Aurora. Aurora, a proprietary database service created by AWS that provides higher levels of performance and scalability, joined the relational database portfolio in 2014 at AWS re:Invent. According to AWS SVP Andy Jass, Aurora is as capable as “…proprietary database engines at one tenth of the cost. Compatible with MySQL, Aurora aims to be an enterprise-class database solution.”

Aurora Features
According to AWS, Aurora is not only cheaper to run than other large scale commercial databases, but it is also much faster than the popular open source, MySQL. The service has increased the scalability of the popular open source database, enabling storage to be automatically provisioned as you go, which is a major advantage in a world where databases are still a main cause of performance bottlenecks. 
Scalability: Go Big Anytime
According to Amazon, Aurora is up to five times faster than the native MySQL deployment, making it ideal for large amounts of data and environments with high performance requirements. You can start with 10Gb of provisioned storage, and as you reach the capacity limit it will automatically increase by 10Gb increments, scaling all the way up to the size of a very large database with tens of TBs. DB cluster architecture can support an “active/active” configuration, where it is possible to have more than one writer. Although this architecture allows for higher levels of scalability, it also produces challenges in terms of coordination and synchronization. The more classic architecture, and what the database uses, is what we call “passive/active”, where only one entity at a time can write to the storage. You can scale out the Aurora DB cluster with as many readers (i.e. Aurora Replica) as required and performance will be guaranteed, at least in terms of reading from the database. In terms of writing, however, Aurora is limited to just one machine (i.e. Primary instance), and in that sense it is similar to RDS, as both require the provisioning of a specific instance for that purpose. You can always upscale your instance size in order to try and keep up with the writing performance.
Fault Tolerant: Go Ahead And…Fail!
In terms of architecture, as we already mentioned, Aurora uses the classic DB cluster architecture which is typically used in large, multiple database environments. A key principle is its single central storage for the database. As the storage the database employs is different from AWS EBS disks, this allows the ability to scale dynamically. AWS has developed a special storage backend for Aurora, which is probably stored in S3 (although we cannot be entirely sure), which will enable durability and inter-availability zone (AZ) replication. In comparison, traditional SAN datacenters store all of the databases to a disk, or the logical disks are stored in a large storage array, having the ability to logically connect to different servers. “An Aurora DB cluster is a fault tolerant by design. The cluster volume spans multiple Availability Zones in a single region, and each Availability Zone contains a copy of the cluster volume data. This functionality means that your DB cluster can tolerate the failure of an Availability Zone without any loss of data and only a brief interruption of service.” As mentioned, in an Aurora cluster there is a single writer instance and multiple readers that read from the disk. If an error occurs and the writer fails or crashes, a simple automatic failover process will take one of the readers and assign it a new role as a writer. As mentioned in AWS documentation, the fact that they are attached to the same storage location within the same network means that there is no recovery downtime or time where data needs to be copied to another location, making it highly available. In addition, the fact that there can be a lot of readers within a database where there are a lot of reads and queries going on enables higher performance levels, since processes can be implemented concurrently on different machines.
Amazon AWS Aurora Vs. RDS
Regular RDS deploys what we call a “DB instance”, a DB server that needs to be provisioned in advance by specifying the instance type and size of storage. Snapshots can be used to migrate to a larger scale, although this process doesn’t support seamless autoscale. You can have a multi AZ deployment, but since RDS needs to perform DB level replication, it is less efficient than the Aurora cluster option. This limitation is one of the key reasons why Aurora is more efficient and scalable than RDS, and therefore makes it a preferable option. Any use case where you have a lot of queries (BI, for example) is a good use case for Amazon Aurora since you have multiple data sources, points, and many queries being performed in parallel. In such cases, you can utilize multiple readers, which eliminates any bottlenecks.
Latest Aurora Updates – New Backtrack & GovCloud
We’ve all been in situations in which we wished there was an ‘Undo’ button to fix something we accidentally broke. Amazon Aurora now has this feature and it allows you to go back to a certain point in time without restoring data from a backup. This functionality can be enabled for all newly-deployed MySQL-compatible Aurora database clusters and MySQL-compatible clusters restored from a backup. Amazon also recently announced that customers who are utilizing GovCloud to back up sensitive data and to meet compliance needs, can now launch an Aurora instance within GovCloud region.
Automation of your Backup and Recovery
In terms of functionality, Aurora is formally part of AWS relational database services (RDS). Aurora supports almost all backup functionalities that are available with RDS, such as point in time recovery and automatic backup. It also supports manual snapshots, however, the snapshot mechanism operates slightly different on Aurora. Instead of acting like a regular snapshot with a disk, like RDS, a snapshot is taken of the backend storage. While not a huge difference by any means, you will notice that a few extra steps are needed in order to recover a fully operating cluster from a snapshot. Therefore, it is recommended to automate your Aurora recovery processes. When an Aurora DB cluster is created from a snapshot, only the backend database will be created, meaning that additional operations will be required to recover the readers and writer. Therefore, you have a multiple step process, rather than a single step process that is possible with RDS. If you are carrying out this process through the console or via an automation tool that has already provided a functionality such as Cloud Protection Manager (CPM), however, then you don’t need to worry about this issue as recovery is just a click away.
One Final Note
When migrating data to the cloud, there is always the vendor lock-in consideration. Even though Aurora claims to be 100% compatible with MySQL, there are no guarantees that it will stay this way forever. Enterprises on Amazon that are looking to move their Oracle, for example, and wish to leverage the benefits of a managed Database-as-a-Service (DBaaS), may find that Aurora is a valuable solution for them. AWS provides a variety of migration tools to help implement the switchover.
Cloud Protection Manager (CPM) now supports Disaster Recovery for Amazon Aurora
The good news is you can start protecting your cloud deployment properly with full cross-region and cross-account disaster recovery now available for Amazon Aurora clusters. We’re extremely excited about supporting Amazon Aurora because typically a full backup and recovery might traditionally take about 2 hours, whereas it can now be done in about 2 minutes. Start your free trial today to ensure implementing an automated robust, scalable, enterprise-class cloud backup and recovery solution.

DB Cluster Configurations

    • Aurora supports two types of instance classes
      • Memory Optimized
      • Burstable Performance
    • Aurora Serverless is an on-demand, autoscaling configuration for Amazon Aurora (supports both MySQL and PostgreSQL). An Aurora Serverless DB cluster automatically starts up, shuts down, and scales up or down capacity based on your application’s needs.
      • A non-Serverless DB cluster for Aurora is called a provisioned DB cluster.
      • Instead of provisioning and managing database servers, you specify Aurora Capacity Units (ACUs). Each ACU is a combination of processing and memory capacity.
      • You can choose to pause your Aurora Serverless DB cluster after a given amount of time with no activity. The DB cluster automatically resumes and services the connection requests after receiving requests.
      • Aurora Serverless does not support fast failover, but it supports automatic multi-AZ failover.
      • The cluster volume for an Aurora Serverless cluster is always encrypted. You can choose the encryption key, but not turn off encryption.
      • You can set the following specific values:
        • Minimum Aurora capacity unit – Aurora Serverless can reduce capacity down to this capacity unit.
        • Maximum Aurora capacity unit – Aurora Serverless can increase capacity up to this capacity unit.
        • Pause after inactivity – The amount of time with no database traffic to scale to zero processing capacity.
      • You pay by the second and only when the database is in use. 
      • You can share snapshots of Aurora Serverless DB clusters with other AWS accounts or publicly. You also have the ability to copy Aurora Serverless DB cluster snapshots across AWS regions.
    • Limitations of Aurora Serverless
      • Aurora Serverless supports specific MySQL and PostgreSQL versions only.
      • The port number for connections must be:
        • 3306 for Aurora MySQL
        • 5432 for Aurora PostgreSQL
      • You can’t give an Aurora Serverless DB cluster a public IP address. You can access an Aurora Serverless DB cluster only from within a virtual private cloud (VPC) based on the Amazon VPC service.
      • Each Aurora Serverless DB cluster requires two AWS PrivateLink endpoints. If you reach the limit for PrivateLink endpoints within your VPC, you can’t create any more Aurora Serverless clusters in that VPC.
      • A DB subnet group used by Aurora Serverless can’t have more than one subnet in the same Availability Zone.
      • Changes to a subnet group used by an Aurora Serverless DB cluster are not applied to the cluster.
      • Aurora Serverless doesn’t support the following features:
        • Loading data from an Amazon S3 bucket
        • Saving data to an Amazon S3 bucket
        • Invoking an AWS Lambda function with an Aurora MySQL native function
        • Aurora Replicas
        • Backtrack
        • Multi-master clusters
        • Database cloning
        • IAM database authentication
        • Restoring a snapshot from a MySQL DB instance
        • Amazon RDS Performance Insights
    • When you reboot the primary instance of an Aurora DB cluster, RDS also automatically restarts all of the Aurora Replicas in that DB cluster. When you reboot the primary instance of an Aurora DB cluster, no failover occurs. When you reboot an Aurora Replica, no failover occurs.
    • Deletion protection is enabled by default when you create a production DB cluster using the AWS Management Console. However, deletion protection is disabled by default if you create a cluster using the AWS CLI or API.
      • For Aurora MySQL, you can’t delete a DB instance in a DB cluster if both of the following conditions are true:
        • The DB cluster is a Read Replica of another Aurora DB cluster.
        • The DB instance is the only instance in the DB cluster.
  • Aurora Multi Master
    • The feature is available on Aurora MySQL 5.6 
    • Allows you to create multiple read-write instances of your Aurora database across multiple Availability Zones, which enables uptime-sensitive applications to achieve continuous write availability through instance failure. 
    • In the event of instance or Availability Zone failures, Aurora Multi-Master enables the Aurora database to maintain read and write availability with zero application downtime. There is no need for database failovers to resume write operations.

Tags

    • You can use Amazon RDS tags to add metadata to your RDS resources.
    • Tags can be used with IAM policies to manage access and to control what actions can be applied to the RDS resources.
    • Tags can be used to track costs by grouping expenses for similarly tagged resources.

Monitoring

    • Subscribe to Amazon RDS events to be notified when changes occur with a DB instance, DB cluster, DB cluster snapshot, DB parameter group, or DB security group.
    • Database log files
    • RDS Enhanced Monitoring — Look at metrics in real time for the operating system.
    • RDS Performance Insights monitors your Amazon RDS DB instance load so that you can analyze and troubleshoot your database performance.
    • Use CloudWatch Metrics, Alarms and Logs

Security

    • Use IAM to control access.
    • To control which devices and EC2 instances can open connections to the endpoint and port of the DB instance for Aurora DB clusters in a VPC, you use a VPC security group.
    • You can make endpoint and port connections using Transport Layer Security (TLS) / Secure Sockets Layer (SSL). In addition, firewall rules can control whether devices running at your company can open connections to a DB instance.
    • Use RDS encryption to secure your RDS instances and snapshots at rest.
    • You can authenticate to your DB cluster using AWS IAM database authentication. IAM database authentication works with Aurora MySQL and Aurora PostgreSQL. With this authentication method, you don’t need to use a password when you connect to a DB cluster. Instead, you use an authentication token, which is a unique string of characters that Amazon Aurora generates on request.
  • Aurora for MySQL
    • Performance Enhancements
      • Push-Button Compute Scaling
      • Storage Auto-Scaling
      • Low-Latency Read Replicas
      • Serverless Configuration
      • Custom Database Endpoints
      • Fast insert accelerates parallel inserts sorted by primary key.
      • Aurora MySQL parallel query is an optimization that parallelizes some of the I/O and computation involved in processing data-intensive queries.
      • You can use the high-performance Advanced Auditing feature in Aurora MySQL to audit database activity. To do so, you enable the collection of audit logs by setting several DB cluster parameters.
    • Scaling
      • Instance scaling – scale your Aurora DB cluster by modifying the DB instance class for each DB instance in the DB cluster.
      • Read scaling – as your read traffic increases, you can create additional Aurora Replicas and connect to them directly to distribute the read load for your DB cluster.

Feature

Amazon Aurora Replicas

MySQL Replicas

Number of Replicas

Up to 15 

Up to 5

Replication type

Asynchronous

(milliseconds)

Asynchronous

(seconds)

Performance impact on primary 

Low

High

Act as failover target

Yes (no data loss)

Yes

(potentially minutes of data loss) 

Automated failover

Yes

No

Support for user-defined replication delay

No

Yes

Support for different data or schema vs. primary

No

Yes

  • Aurora for PostgreSQL
    • Performance Enhancements
      • Push-button Compute Scaling
      • Storage Auto-Scaling
      • Low-Latency Read Replicas
      • Custom Database Endpoints
    • Scaling
      • Instance scaling
      • Read scaling
    • Amazon Aurora PostgreSQL now supports logical replication. With logical replication, you can replicate data changes from your Aurora PostgreSQL database to other databases using native PostgreSQL replication slots, or data replication tools such as the AWS Database Migration Service.
    • Rebooting the primary instance of an Amazon Aurora DB cluster also automatically reboots the Aurora Replicas for that DB cluster, in order to re-establish an entry point that guarantees read/write consistency across the DB cluster.
    • You can import data (supported by the PostgreSQL COPY command) stored in an Amazon S3 bucket into a PostgreSQL table.

No comments:

Post a Comment