Sunday, 20 March 2022

AWS Migration Strategies – The 6 R’s

 

  • The 6 R’s
    • Rehost (“lift and shift”)
      • Move applications to AWS without changes. In large-scale, legacy migrations, organizations are looking to move quickly to meet business objectives.
      • Applications may become easier to re-architect once they are already running in the cloud. This happens because the hard part, which is migrating the application, data, and traffic, has already been accomplished.
    • Replatform (“lift, tinker and shift”)
      • You are making a few cloud optimizations in order to achieve some tangible benefit without changing the core architecture of the application.
      • Replatforming Tools
        1. Amazon Relational Database Service (RDS) for relational databases – AWS manages the database application for you, so you can focus on the application of the database itself
        2. AWS Elastic Beanstalk – a fully managed platform where you can simply deploy your code, and AWS will handle scaling, load balancing, monitoring, database and compute provisioning for you
    • Repurchase (“drop and shop”)
      • Move from perpetual licenses to a software-as-a-service model. For workloads that can easily be upgraded to newer versions, this strategy might allow a feature set upgrade and smoother implementation.
      • For example, you will discontinue using your local VPN solution and instead purchase a commercial VPN software from AWS Marketplace such as OpenVPN for AWS.
    • Refactor / Re-architect
      • Re-imagine how the application is architected and developed using cloud-native features.
      • Typically, this is driven by a strong business need to add features, scale, or performance that would otherwise be difficult to achieve in the application’s existing environment.
      • This migration strategy is often the most expensive solution.
    • Retire
      • Identify IT assets that are no longer useful and can be turned off. This will help boost your business case and direct your attention towards maintaining the resources that are widely used.
    • Retain
      • Retain portions of your IT portfolio if there are some applications that are not ready to be migrated and will produce more benefits when kept on-premises, or you are not ready to prioritize an application that was recently upgraded and then make changes to it again.

AWS Cloud Migration

  • The following strategies are arranged in increasing order of complexity — this means that the time and cost required to enact the migration will be proportional to the increase, but will provide greater opportunity for optimization
    • Retire(simples) < Retain < Rehost < Repurchase < Replatform < Re-architect/Refactor (most complex)
  • Consider a phased approach to migrating applications, prioritizing business functionality in the first phase, rather than attempting to do it all in one step.
  • General Migration Tools
    • AWS Migration Hub – provides a single location to track the progress of application migrations across multiple AWS and partner solutions. Using Migration Hub allows you to choose the AWS and partner migration tools that best fit your needs, while providing visibility into the status of migrations across your portfolio of applications.
    • AWS Application Discovery Service – collects and presents configuration, usage, and behavior data from your servers to help you plan your migration.
    • AWS Server Migration Service (SMS) – an agentless service for migrating thousands of on-premises workloads to AWS.
    • AWS Database Migration Service (DMS) – helps you migrate databases to AWS. The source database remains fully operational during the migration.
    • AWS Snowball – a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS.
    • AWS Snowmobile – an exabyte-scale data transfer service used to move extremely large amounts of data to AWS.
    • AWS Direct Connect – lets you establish a dedicated network connection line between your network and one of the AWS Direct Connect locations.
    • Amazon Kinesis Firehose – a fully managed service for loading streaming data into AWS.
    • AWS Marketplace – where you can purchase different types of software and licenses offered by AWS Partners and other AWS Users.

AWS DataSync

 

  • An online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect. 
  • DataSync can copy data between:
    • Network File System (NFS) or Server Message Block (SMB) file servers, 
    • Amazon Simple Storage Service (Amazon S3) buckets, 
    • Amazon Elastic File System (Amazon EFS) file systems, 
    • Amazon FSx for Windows File Server file systems

How It Works

AWS DataSync

    1. Deploy an agent – Deploy a DataSync agent and associate it to your AWS account via the Management Console or API. The agent will be used to access your NFS server or SMB file share to read data from it or write data to it.
    2. Create a data transfer task – Create a task by specifying the location of your data source and destination, and any options you want to use to configure the transfer, such as the desired task schedule.
    3. Start the transfer – Start the task and monitor data movement in the console or with Amazon CloudWatch.

Concepts

    • Agent – A virtual machine used to read data from or write data to an on-premises location.
    • Location – Any source or destination location used in the data transfer.
    • Task – A task includes two locations (source and destination), and also the configuration of how to transfer the data from one location to the other. Configuration settings can include options such as how to treat metadata, deleted files, and copy permission.
    • Task execution – An individual run of a task, which includes options such as start time, end time, bytes written, and status. A task execution has five transition phases and two terminal statuses, as shown in the following diagram. If the VerifyMode option is not enabled, a terminal status occurs after the TRANSFERRING phase. Otherwise, it occurs after the VERIFYING phase.

AWS DataSync

Features

    • The service employs an AWS-designed transfer protocol—decoupled from storage protocol—to speed data movement. The protocol performs optimizations on how, when, and what data is sent over the network. 
    • A single DataSync agent is capable of saturating a 10 Gbps network link.
    • DataSync auto-scales cloud resources to support higher-volume transfers, and makes it easy to add agents on-premises.
    • All of your data is encrypted in transit with TLS. DataSync supports using default encryption for S3 buckets using Amazon S3-Managed Encryption Keys (SSE-S3), and Amazon EFS file system encryption of data at rest.
    • DataSync supports storing data directly into S3 Standard, S3 Intelligent-Tiering, S3 Standard-Infrequent Access (S3 Standard-IA), S3 One Zone-Infrequent Access (S3 One Zone-IA), Amazon S3 Glacier, and S3 Glacier Deep Archive.
    • You can use AWS DataSync to copy files into EFS and configure EFS Lifecycle Management to migrate files that have not been accessed for a set period of time to the Infrequent Access (IA) storage class.
    • DataSync ensures that your data arrives intact by performing integrity checks both in transit and at rest. 
    • You can specify an exclude filter, an include filter, or both, to limit which files, folders, or objects get transferred each time a task runs.
    • Task scheduling enables you to configure periodically executing a task, to detect and copy changes from your source storage system to the destination.
    • DataSync supports VPC endpoints (powered by AWS PrivateLink) in order to move files directly into your Amazon VPC.

Use Cases

    • Data migration to Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server.
    • Data processing for hybrid workloads. If you have on-premises systems generating or using data that needs to move into or out of AWS for processing, you can use DataSync to accelerate and schedule the transfers.
    • If you have large amounts of cold data stored in expensive on-premises storage systems, you can move this data directly to durable and secure long-term storage such as Amazon S3 Glacier or Amazon S3 Glacier Deep Archive.
    • If you have large Network Attached Storage (NAS) systems with important files that need to be protected, you can replicate them into S3 using DataSync.
  • DataSync Agent
    • Agents need to be activated first using an activation key entered in the AWS console, before you can start using them. You must activate your agent in the same region where your S3 or EFS source/destination resides.
    • You run DataSync on-premises as a virtual machine (VM).
    • DataSync provides an Amazon Machine Image (AMI) that contains the DataSync VM image when running in an EC2 instance.
    • The agent VM requires access to some endpoints to communicate with AWS. You must configure your firewall settings to allow these connections.
    • You can have more than one DataSync Agent running.
  • AWS DataSync vs AWS CLI tools
    • AWS DataSync fully automates and accelerates moving large active datasets to AWS, up to 10 times faster than command line tools.
    • DataSync uses a purpose-built network protocol and scale-out architecture to transfer data.
    • DataSync fully automates the data transfer. It comes with retry and network resiliency mechanisms, network optimizations, built-in task scheduling, and CloudWatch monitoring that provides granular visibility into the transfer process. 
    • DataSync performs data integrity verification both during the transfer and at the end of the transfer.
    • DataSync provides end to end security, and integrates directly with AWS storage services.
  • AWS DataSync vs Snowball/Snowball Edge
    • AWS DataSync is ideal for online data transfers. AWS Snowball/ Snowball Edge is suitable for offline data transfers, for customers who are bandwidth constrained, or transferring data from remote, disconnected, or austere environments. 
  • AWS DataSync vs AWS Storage Gateway File Gateway
    • Use AWS DataSync to migrate existing data to Amazon S3, and then use the File Gateway to retain access to the migrated data and for ongoing updates from your on-premises file-based applications.
  • AWS DataSync vs Amazon S3 Transfer Acceleration
    • If your applications are already integrated with the Amazon S3 API, and you want higher throughput for transferring large files to S3, you can use S3 Transfer Acceleration. If not, you may use AWS DataSync.
  • AWS DataSync vs AWS Transfer for SFTP
    • If you currently use SFTP to exchange data with third parties, you may use AWS Transfer for SFTP to transfer directly these data.
    • If you want an accelerated and automated data transfer between NFS servers, SMB file shares, Amazon S3, Amazon EFS, and Amazon FSx for Windows File Server, you can use AWS DataSync.

Pricing

    • You pay for the amount of data that you copy. Your costs are based on a flat per-gigabyte fee for the use of network acceleration technology, managed cloud infrastructure, data validation, and automation capabilities in DataSync. 
    • You are charged standard request, storage, and data transfer rates to read to and write from AWS services, such as Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, and AWS Key Management Service (KMS).
    • When copying data from AWS to an on-premises storage system, you pay for AWS Data Transfer at your standard rate. You are also charged standard rates for Amazon CloudWatch Logs, Amazon CloudWatch Events, and Amazon CloudWatch Metrics.
    • You will be billed by AWS PrivateLink for interface VPC endpoints that you create to manage and control the traffic between your agent(s) and the DataSync service over AWS PrivateLink.

Limits

Resource 

Quota

Maximum number of tasks you can create in account per AWS Region

100

Maximum number of files per task

50 million

For tasks that transfer more than 20 million files, make sure that you allocate a minimum of 64 GB of RAM to the VM

Maximum throughput per task 

10 Gbps

AWS Database Migration Service

 

  • AWS Database Migration Service supports homogeneous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle or Microsoft SQL Server to Amazon Aurora.
  • You can use Database Migration Service for one-time data migration into RDS and EC2-based databases.
  • You can also continuously replicate your data with high availability (enable multi-AZ) and consolidate databases into a petabyte-scale data warehouse by streaming data to Amazon Redshift and Amazon S3.
  • Continuous replication can be done from your data center to the databases in AWS or the reverse.
  • Replication between on-premises to on-premises databases is not supported.
  • The service provides an end-to-end view of the data replication process, including diagnostic and performance data for each point in the replication pipeline.
  • Supports transaction commit date partitioning in CDC Mode when you select Amazon S3 as a target. You can write data from a single source table to a time-hierarchy folder structure in Amazon S3.
Supported SourcesSupported targets
  • Oracle
  • Microsoft SQL Server non-Web and Express editions 
  • MySQL
  • MariaDB
  • PostgreSQL
  • MongoDB
  • SAP Adaptive Server Enterprise (ASE)
  • IBM Db2 for Linux, UNIX, and Windows
  • Azure SQL Database
  • Amazon DocumentDB
  • Amazon S3
  • Oracle
  • Microsoft SQL Server non-Web and Express editions
  • MySQL
  • MariaDB
  • PostgreSQL
  • SAP Adaptive Server Enterprise (ASE)
  • Amazon Aurora with MySQL or PostgreSQL compatibility
  • Amazon Aurora Serverless
  • Amazon Redshift
  • Amazon S3
  • Amazon DynamoDB
  • Amazon Elasticsearch Service
  • Amazon Kinesis Data Streams
  • Amazon DocumentDB (with MongoDB compatibility)
  • Amazon Neptune
  • Apache Kafka

AWS Schema Conversion Tool (SCT)

    • The AWS Schema Conversion Tool makes heterogeneous database migrations predictable by automatically converting the source database schema and a majority of the database code objects, including views, stored procedures, and functions, to a format compatible with the target database.
    • SCT can also scan your application source code for embedded SQL statements and convert them as part of a database schema conversion project.
    • Supported migrations

Source Database

Target Database

Microsoft SQL Server

Amazon Aurora with MySQL or PostgreSQL, MariaDB, Microsoft SQL Server, MySQL, PostgreSQL

MySQL

Aurora PostgreSQL, MySQL, PostgreSQL

You can migrate schema and data from MySQL to an Aurora MySQL DB cluster without using AWS SCT.

Oracle Database

Aurora MySQL or PostgreSQL, MariaDB, MySQL, Oracle, PostgreSQL

PostgreSQL

Aurora MySQL, MySQL, PostgreSQL, Aurora PostgreSQL

IBM Db2 LUW

Aurora MySQL, MariaDB, MySQL, PostgreSQL, Aurora PostgreSQL

Sybase ASE

Aurora MySQL, Aurora PostgreSQL, MySQL, PostgreSQL

Oracle Data Warehouse, Microsoft SQL Server, Teradata, IBM Netezza, Greenplum, HPE Vertica

Amazon Redshift

Apache Cassandra

Amazon DynamoDB

 

Basic Schema Copy

    • To quickly migrate a database schema to your target instance you can rely on the Basic Schema Copy feature of AWS Database Migration Service.
    • Basic Schema Copy will automatically create tables and primary keys in the target instance if the target does not already contain tables with the same names.
    • It will not migrate secondary indexes, foreign keys or stored procedures. When you need to use a more customizable schema migration process, use AWS SCT.

Pricing

    • You only pay for the compute resources used during the migration process and any additional log storage. Each database migration instance includes storage sufficient for swap space, replication logs, and data cache for most replications and inbound data transfer is free.

Amazon Quantum Ledger Database (QLDB)

 

  • Fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log ‎owned by a central trusted authority.
  • Used to track all application data changes, and maintain a complete and verifiable history of changes over time
  • Amazon QLDB is serverless.  No capacity provisioning required or setting read/write limits.
  • QLDB transactions are ACID (atomicity, consistency, isolation, and durability) compliant.
  • Amazon QLDB uses PartiQL as its query language.

How it Works

amazon qldb

Common Use Cases

  • Finance
    • Banks can use Amazon QLDB to easily store an accurate and complete record of all financial transactions, instead of building a custom ledger with complex auditing functionality.
  • Insurance
    • Insurance companies can use Amazon QLDB to track the entire history of claim transactions. Whenever a conflict arises, Amazon QLDB can cryptographically verify the integrity of the claims data.

Components Of QLDB

  • Ledger
    • Consists of tables and journals that keep all of the immutable histories of changes in the table.
  • Tables 
    • Contains a collection of document revisions.
  • Journal
    • An immutable transactions log where transactions are appended as a sequence of blocks that are cryptographically chained together to provide a secure verification and immutability of the history of changes to your ledger data.
    • Only the data’s history of change cannot be altered and not the data itself.
  • Current State
    • The current state is similar to a traditional database where you can view and query the latest data.
  • History
    • The history is a table where you can view and query the history of all the data and every change ever made to the data.

Performance

  • Amazon QLDB can execute 2 – 3X as many transactions than ledgers in common blockchain frameworks.

Scalability

  • Amazon QLDB automatically scales based on the workloads of your application.

Reliability

  • Multiple copies of QLDB ledger are replicated across availability zones in a region. You can still continue to operate QLDB even in the case of zone failure.
  • Ensures redundancy within a region.
  • Also ensures full recovery when an availability zone goes down.

Backup and Restore

  • You can export the contents of your QLDB journals to S3 as a backup plan.

Security

  • Amazon QLDB uses SHA-256 hash function to make a secure file representation of your data’s change history called digest. The digest serves as a proof of your data’s change history, enabling you to go back at a point in time to verify the validity and integrity of your data changes.
  • All data in transit and at rest are encrypted by default.
  • Uses AWS-owned keys for encryption of data.
  • The authentication is done by attaching a signature to the HTTP requests. The signature is then verified using the AWS credentials.
  • Integrated with AWS Private Link.

Pricing

  • You are billed based on five categories
    • Write I/Os
      • Pricing per 1 million requests
    • Read I/Os
      • Pricing per 1 million requests
    • Journal Storage Rate
      • Pricing per GB-month
    • Indexed Storage Rate
      • Pricing per GB-month
    • Data Transfer OUT From Amazon QLDB To Internet
      • You are charged based on the amount of data transferred per month. The rate varies for different regions.

Limitations

  • Amazon QLDB does not support Backup and Restore. But you can export your data from QLDB to S3.
  • Does not support Point-in-time restore feature.
  • Does  not support cross-region replication.
  • Does not support the use of customer managed CMKs (Customer Managed Keys).

Amazon Neptune

 

  • Amazon Neptune is a fully managed graph database service used for building applications that work with highly connected datasets.
  • Optimized for storing billions of relationships between pieces of information.
  • Provide milliseconds latency when querying the graph.
  • Neptune supports graph query languages like Apache TinkerPop Gremlin and W3C’s SPARQL.

How it works

Amazon Neptune

Common Use Cases

  • Social Networking
    • Amazon Neptune can easily process user’s interactions like comments, follows, and likes in a social network application through highly interactive queries.
  • Recommendation Engines
    • You can use Amazon Neptune to build applications for suggesting personalized and relevant products based on relationships between information such as customer’s interest and purchase history.
  • Knowledge Graphs
    • With the help of Amazon Neptune, you can create a knowledge graph for search engines that will enable users to quickly discover new information. 
  • Identity Graphs
    • You can use Amazon Neptune as a graph database to easily link and update user profile data for ad-targeting, personalization, and analytics. 

Performance

  • Supports 15 read replicas and 100,000s of queries per second.
  • Amazon Neptune uses query optimization for both SPARQL queries and Gremlin traversals.

Reliability

  • Database volume is replicated six ways across three availability zones.
  • Amazon Neptune can withstand a loss of up to two copies of data and three copies of data without affecting write availability and read availability respectively.
  • Amazon Neptune’s storage is self-healing. Data blocks are continuously scanned for errors and replaced automatically.
  • Amazon Neptune uses asynchronous replication to update the changes made to the primary instance to all of Neptune’s read replicas.
  • Replicas can act as a failover target with no data loss.
  • Supports automatic failover.
  • Supports promotion priority within a cluster. Amazon Neptune will promote the replica with the highest priority tier to primary when the primary instance fails.

 

Cluster Volume

Local Storage

STORED DATA TYPE

Persistent data

Temporary data

SCALABILITY

Automatically scales out when more space is required

Limited to the DB Instance class

Backup And Restore

  • Automated backups are always enabled.
  • Supports Point-In-Time restoration, which can be up to 5 minutes in the past.
  • Supports sharing of encrypted manual snapshots.

Security

  • Amazon Neptune supports AWS Key Management Service ( KMS ) encryption at rest.
  • It also supports HTTPS connection. Neptune enforces a minimum version of TLS v1.2 and SSL client connections to Neptune in all AWS Regions where Neptune is available.
  • To encrypt an existing Neptune instance, you should create a new instance with encryption enabled and migrate your data into it.
  • You can create custom endpoints for Amazon Neptune to access your workload. Custom endpoints allow you to distribute your workload across a designated set of instances within a Neptune cluster.
  • Offers database deletion protection.

Pricing

  • You are billed based on the DB instance hours, I/O requests, storage, and Data transfer.
  • Storage rate and I/O rate is billed in per GB-month increments and per million request increments respectively.

Monitoring

  • Visualize your graph using the Neptune Workbench.
  • You can receive event notifications on your Amazon Neptune DB clusters, DB instances, DB cluster snapshots, parameter groups, or security groups through Amazon SNS.

Limitations

  • It does not support cross-region replicas.
  • Encryption of an existing Neptune instance is not supported.
  • Sharing of automatic DB snapshots to other accounts is not allowed. A workaround for this is to manually copy the snapshot from the automatic snapshot, then, copy the manual snapshot to another account.

Amazon DocumentDB

 

  • Fully managed document database service designed to be fast, scalable, and highly available.
  • Data is stored in JSON-like documents.
  • Compatible with MongoDb.
  • Flexible schema and indexing.
  • Commonly used for content management, user profiles, and real-time big data.

How it Works


how does amazon documentdb work

 

  • An Amazon DocumentDB cluster decouples storage and compute.
  • A cluster consists of Cluster volume and Instances
    • Cluster volume refers to the storage layer that spans multiple Availability Zones. Each Availability Zone has a copy of the cluster data.
    • Instances refers to the compute layer. It provides the processing power needed for the database to write data to, and read data from, the cluster volume. 
  • Amazon DocumentDB Endpoints
    • Cluster endpoint
      • Connects to cluster’s current primary instance.
      • Can be used for both read and write operations.
    • Reader endpoint
      • Connects to one of the available replicas of the cluster.
      • Use for read operations only.
      • If the cluster has more than one replica, the reader endpoint will direct each request to DocumentDB replicas.
    • Instance endpoint
      • Connects to a specific instance in the cluster.
      • Use for specialized workloads that will only affect specific replica instances.

Performance

  • Provides millions of requests per second with millisecond latency and has twice the throughput of MongoDb.

Scaling

  • The minimum storage is 10GB. The Amazon DocumentDB storage will automatically scale up to 64 TB in 10 GB increments without affecting performance.
  • The Amazon DocumentDB cluster can be scaled by modifying the instance class for each instance in the cluster.
  • You can create up to 15 Amazon DocumentDB replicas in the cluster.
  • The replication lag is usually less than 100 milliseconds after the primary instance has written an update.

Reliability

  • The cluster volume provides durability by maintaining six copies of all data across three Availability Zones.
  • Amazon DocumentDB uses asynchronous replication to update the changes made to the primary instance to all of DocumentDB’s read replicas.
  • In most cases, the DocumentDB’s restart time is less than a minute after a database crash.
  • DocumentDB replicas can act as a failover target with no data loss.
  • Supports automatic failover.
  • Supports promotion priority within a cluster. Amazon DocumentDB will promote the replica with the highest priority tier to primary when the primary instance fails.
  • To increase the cluster’s availability, create replicas in multiple Availability Zones. The Amazon DocumentDB will automatically include the replicas when selecting for a failover target in the event of an instance failure.

Backup And Restore

 

Cluster Volume

Local Storage

STORED DATA TYPE

Persistent data

Temporary data

SCALABILITY

Automatically scales out when more space is required

Limited to the DB Instance class

  • Automated backups are always enabled.
  • Supports Point-In-Time restoration, which can be up to 5 minutes in the past.
  • You can restore from a cluster snapshot.
  • Supports sharing of encrypted manual snapshots.
  • Supports cross-region snapshot copying.

Security

  • You can authenticate a connection to a DocumentDB database through standard MongoDb tools with Salted Challenge Response Authentication Mechanism (SCRAM).
  • You can authenticate and authorize the use of DocumentDB management APIs through the use of IAM users, roles, and policies.
  • Data in transit is encrypted using Transport Layer Security (TLS).
  • Data at rest is encrypted using keys you manage through AWS KMS.
  • Amazon DocumentDB supports role based access control ( RBAC ) with built-in roles to enforce the principle of least privileged access.

Pricing

  • You are billed based on four categories
    • On-demand instances
      • Pricing per second with a 10-minute minimum
    • Database I/O
      • Pricing per million I/Os
    • Database Storage
      • Pricing per GB/month
    • Backup Storage
      • Pricing per GB/month