Monday, 18 July 2022

AWS Database Migration Service : Theory

 

What is AWS Database Migration Service?

AWS Database Migration Service (DMS) is a managed and automated service that provides a quick and secure way to migrate database from on-premise databases, DB instance, or database running on EC2 instance to the cloud. It helps you modernize, migrate, and manage your environments in the AWS cloud. Amazon provides a wide spectrum of databases to work with such as Amazon RDS, Aurora, DynamoDB, ElasticCache, and Redshift.

Check out this blog to know more in detail about AWS Database Service – Amazon RDS, Aurora, DynamoDB, ElastiCache

Source-Target-DynamoDB

DMS can be used to migrate relational databases, data warehouses, NoSQL databases, and other types of databases into the cloud. AWS DMS supports homogeneous (i.e. Oracle to Oracle) and heterogeneous (i.e. Oracle to Amazon Aurora) database migration. During migration the source database remains operational, thereby minimizing the downtime. The entire migration process can be controlled from the AWS Management Console.

Learn With Us: Join our  AWS Solution Architect Training and understand AWS basics in an easy way.

AWS DMS Benefits

AWS Database Migration Service has various benefits over traditional migration methods such as:

dms benefits

  • Minimal downtime – DMS continuously replicates the changes to your source database during migration while keeping your source database operational. This allows you to switch over at any time without shutting down either database.
  • Supports Widely Used Databases – AWS Database Migration Service can migrate your data to and from most of the widely used commercial and open-source databases.
  • Fast & Easy Setup – A migration task can be set up within a few minutes in the AWS Management Console.
  • Low cost – DMS is a free migration service for migration to Aurora, Redshift, DynamoDB, or DocumentDB. For other databases, you have to pay based on the amount of log storage and computational power needed to transfer.
  • Reliability – DMS is a self-healing service and will automatically restart in case of an interruption occurs. DMS provides an option of setting up a Multi-AZ (availability zone) replication for disaster recovery.

Know More: About AWS Route 53.

How does AWS DMS work?

AWS Database Migration Service (DMS) is a managed and automated migration service that allows you to migrate your data from one database to another. The process starts by first connecting DMS to the endpoints; source, and target endpoints. The only requirement to use AWS DMS is that one of your endpoints must be on an AWS service.

AWS Database migration starts by first connecting to your source database, the service then reads the data, formats the data according to the target database. It then loads the data into the target database. It goes through a full load migration where the source data is moved to the target. During full load, any changes made to the tables being loaded are cached on the replication server; these are the cached changes. Once the full load completes, AWS DMS immediately begins to apply the cached changes to the database to keep both the source and target database in sync with each other.

aws migration components

Components of AWS Database Migration Service

AWS DMS migration consists of three components that you should know about before starting with migration:

  • Replication instance
  • Source & Target Endpoints
  • Replication tasks

database migration components

Replication Instance

A replication instance is simply a managed Amazon Elastic Compute Cloud (EC2) instance that is used to host one or more replication tasks. The above image shows a replication instance running several associated replication tasks.

Endpoints

DMS uses an endpoint to connect to your source and target databases. When you create an endpoint, you require the following information:

  • Endpoint type
  • Engine type
  • Server name
  • Port number
  • Encryption protocols
  • Credentials

You can create an endpoint using AWS DMS console, where you test the endpoint connection that verifies whether the database exists at the given server name and port, and the supplied credentials can be used to connect to the database.

Replication Tasks

A replication task is used to move data from the source endpoint to the target endpoint, this is where you specify what tables and schemas are moved between your source and target databases and when. Creating a replication task is the last step you need before you start a migration.

When you create a replication task, you need to specify which replication instance to use, your target and source endpoints, and your migration type option.

AWS Schema Conversion Tool

For a homogenous migration, DMS attempts to create a target schema. However, it is not always possible. In these cases, we use tools such as MySQL Workbench, Oracle SQL Developer, and pgAdmin III.

For heterogeneous migrations, AWS Database Migration Service is not able to perform schema conversions. In these cases, you can use AWS Schema Conversion Tool (SCT). SCT automatically converts source schema to a compatible format for your target database.  It can create a target schema and also can generate and create an entire schema, with tables, indexes, views, and so on.

Use Cases

DMS supports migration to Amazon RDS, Aurora, Redshift, DynamoDB, and DocumentDB. There are several use cases of AWS DMS, some of them are listed below:

1. Homogeneous Database Migration

Homogeneous Database Migration is when the source and target databases are the same or compatible with each other such as Oracle to Amazon RDS for Oracle, MySQL to Amazon Aurora, MySQL to Amazon RDS for MySQL, or Microsoft SQL Server to Amazon RDS for SQL Server. Since the schema structure and data types of source and target database are compatible, it is a one-step process as there is no need for schema conversion.

homogeneous database migration

2. Heterogeneous Database Migration

Heterogeneous Database Migration is when the source and target database engines are different from each other such as Oracle to Amazon Aurora, Oracle to PostgreSQL, or Microsoft SQL Server to MySQL migrations. In this case, the schema structure and data types of source and target databases are different from each other, this requires a schema and code transformation before migration, which makes it a two-step process.

heterogeneous-database-migrations

Migrating an On-Premises Oracle Database to Amazon Aurora MySQL

In this section, we will look at the step-by-step process for migrating an on-premises Oracle database (the source endpoint) to an Amazon Aurora with MySQL compatibility (the target endpoint) using AWS Database Migration Service (AWS DMS).

Before starting, you must have an AWS cloud account, if you don’t know how to create one, read our blog on How To Create AWS Free Tier Account.

Step 1: Configure Your Source Oracle Database

1. Run the following command to enable supplemental logging at the database level for AWS DMS

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

2. If you are using an account with the minimal privileges required by AWS DMS, then you need to make sure it has the required privileges.

Step 2: Configure Your Target Aurora MySQL Database

If you want to create a temporary dms_user with the minimal privileges required for migration, then follow the steps in official AWS docs.

Step 3: Create a Replication Instance

1. Sign in to the AWS Management Console, and open the AWS DMS console and choose Replication instances.

2. Click on Create replication instance.

aws replication instance

3. On the Create replication instance page, enter the required details. Once done, click Create.

create replication instance

Step 4: Create Oracle Source Endpoint

1. Go to the AWS DMS console, choose Endpoints. Select Create Endpoint.

aws endpoints

2. On the Create database endpoint page, enter the required details and create the source endpoint.

create endpoint

Step 5: Create Aurora MySQL Target Endpoint

Create a target endpoint in a similar way that you created the source endpoint. Select Target endpoint on the create endpoint details page, select Aurora MySQL as your source engine, enter the database details for Aurora database, and create the endpoint.

Step 6: Create a Migration Task

1. Go to the AWS DMS console, choose Database Migration Tasks, click Create Task.

aws migration tasks

2. On the Create Task page, select the replication instance, source, and target endpoints that we created in previous steps and enter the other required details.

create dms task

Conclusion

I hope that by now you have a better understanding of AWS Database Migration Service, its benefits, components, and working. This should help you overcome the complex challenges of database migration to AWS.

Thursday, 7 July 2022

AWS Redshift : Theory

What is Redshift?

  • Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud.
  • Customers can use the Redshift for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year.
  • Sum of Radios sold in EMEA.
  • Sum of Radios sold in Pacific.
  • Unit cost of radio in each region.
  • Sales price of each radio
  • Sales price - unit cost
Redshift ConfigurationRedshift
  • Single node
  • Multi-node
  • Leader Node
  • It manages the client connections and receives queries. A leader node receives the queries from the client applications, parses the queries, and develops the execution plans. It coordinates with the parallel execution of these plans with the compute node and combines the intermediate results of all the nodes, and then return the final result to the client application.
  • Compute Node
  • A compute node executes the execution plans, and then intermediate results are sent to the leader node for aggregation before sending back to the client application. It can have up to 128 compute nodes.
Redshift

OLAP

OLAP is an Online Analytics Processing System used by the Redshift.

OLAP transaction Example:

Suppose we want to calculate the Net profit for EMEA and Pacific for the Digital Radio Product. This requires to pull a large number of records. Following are the records required to calculate a Net Profit:

The complex queries are required to fetch the records given above. Data Warehousing databases use different type architecture both from a database perspective and infrastructure layer.

Redshift consists of two types of nodes:

Single node: A single node stores up to 160 GB.

Multi-node: Multi-node is a node that consists of more than one node. It is of two types:

Let's understand the concept of leader node and compute nodes through an example.

Redshift warehouse is a collection of computing resources known as nodes, and these nodes are organized in a group known as a cluster. Each cluster runs in a Redshift Engine which contains one or more databases.

When you launch a Redshift instance, it starts with a single node of size 160 GB. When you want to grow, you can add additional nodes to take advantage of parallel processing. You have a leader node that manages the multiple nodes. Leader node handles the client connection as well as compute nodes. It stores the data in compute nodes and performs the query.

Why Redshift is 10 times faster

Redshift is 10 times faster because of the following reasons:

  • Columnar Data Storage
    Instead of storing data as a series of rows, Amazon Redshift organizes the data by column. Row-based systems are ideal for transaction processing while column-based systems are ideal for data warehousing and analytics, where queries often involve aggregates performed over large data sets. Since only the columns involved in the queries are processed and columnar data is stored in a storage media sequentially, column-based systems require fewer I/Os, thus, improving query performance.
  • Advanced Compression
    Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk. Amazon Redshift employs multiple compression techniques and can often achieve significant compression relative to traditional relation data stores.
    Amazon Redshift does not require indexes or materialized views so, it requires less space than traditional relational database systems. When loading a data into an empty table, Amazon Redshift samples your data automatically and selects the most appropriate compression technique.
  • Massively Parallel Processing
    Amazon Redshift automatically distributes the data and loads the query across various nodes. An Amazon Redshift makes it easy to add new nodes to your data warehouse, and this allows us to achieve faster query performance as your data warehouse grows.

Redshift features

Features of Redshift are given below:

Redshift
  • Easy to setup, deploy and manage
    • Automated Provisioning
      Redshift is simple to set up and operate. You can deploy a new data warehouse with just a few clicks in the AWS Console, and Redshift automatically provisions the infrastructure for you. In AWS, all the administrative tasks are automated, such as backups and replication, you need to focus on your data, not on the administration.
    • Automated backups
      Redshift automatically backups your data to S3. You can also replicate the snapshots in S3 in another region for any disaster recovery.
  • Cost-effective
    • No upfront costs, pay as you go
      Amazon Redshift is the most cost-effective data warehouse service as you need to pay only for what you use.
      Its costs start with $0.25 per hour with no commitment and no upfront costs and can scale out to $250 per terabyte per year.
      Amazon Redshift is the only data warehouse service that offers On Demand pricing with no up-front costs, and it also offers Reserved instance pricing that saves up to 75% by providing 1-3 year term.
    • Choose your node type.
      You can choose either of the two nodes to optimize the Redshift.
      • Dense compute node
        Dense compute node can create a high-performance data warehouses by using fast CPUs, a large amount of RAM, and solid-state disks.
      • Dense storage node
        If you want to reduce the cost, then you can use Dense storage node. It creates a cost-effective data warehouse by using a larger hard disk drive.
  • Scale quickly to meet your needs.
    • Petabyte-scale data warehousing
      Amazon Redshift automatically scales up or down the nodes according to the need changes. With just a few clicks in the AWS Console or a single API call can easily change the number of nodes in a data warehouse.
    • Exabyte-scale data lake analytics
      It is a feature of Redshift that allows you to run the queries against exabytes of data in Amazon S3. Amazon S3 is a secure and cost-effective data to store unlimited data in an open format.
    • Limitless concurrency
      It is a feature of Redshift means that the multiple queries can access the same data in Amazon S3. It allows you to run the queries across the multiple nodes regardless of the complexity of a query or the amount of data.
  • Query your data lake
    Amazon Redshift is the only data warehouse which is used to query the Amazon S3 data lake without loading data. This provides flexibility by storing the frequently accessed data in Redshift and unstructured or infrequently accessed data in Amazon S3.
  • Secure
    With a couple of parameter settings, you can set the Redshift to use SSL to secure your data. You can also enable encryption, all the data written to disk will be encrypted.
  • Faster performance
    Amazon Redshift provides columnar data storage, compression, and parallel processing to reduce the amount of I/O needed to perform queries. This improves query performance.

AWS QLDB : Theory

Amazon QLDB :

Amazon QLDB offers a fully managed ledger database. It offers all the key features of a blockchain ledger database including immutability, transparency and cryptographically verifiable transaction log. However, QLDB cis owned by a central trusted authority. So, in a sense, it has almost all the features of a distributed ledger technology with a centralized approach.

Also, you can’t compare Amazon QLDB vs blockchain as the two have some fundamental differences. QLDB is launched along with Amazon Managed Blockchain.

Amazon QLDB Use-Cases :

In this section, we will take a look at the Amazon QLDB use-cases. The Amazon QLDB use-cases are important to get a complete glimpse of what QLDB has to offer.

Manufacturing :

The manufacturing companies can take full advantage of what QLDB Amazon has to offer. In manufacturing, it is important for companies to make sure that their supply chain data matches that of the supply chain. With QLDB, they can record every transaction and its history. As we’re already seeing blockchain in manufacturing, QLDB will only make things more efficient.

This means that each of their individual batches will be properly documented. In the end, they will be equipped with the knowledge of tracing the parts if something goes wrong during the distribution life cycle of a product.

QLDB Customers and Partners :

At the time of writing, QLDB has made strong partners and has also acquired customers. Some of them include the following.

  • Digital Asset
  • Accenture
  • Asano
  • Realm
  • Wipro
  • Zillant
  • Splunk
  • Klarna.
How it Works :

amazon qldb

Common Use Cases

  • Finance
    • Banks can use Amazon QLDB to easily store an accurate and complete record of all financial transactions, instead of building a custom ledger with complex auditing functionality.
  • Insurance
    • Insurance companies can use Amazon QLDB to track the entire history of claim transactions. Whenever a conflict arises, Amazon QLDB can cryptographically verify the integrity of the claims data.

Components Of QLDB :

  • Ledger :
    • Consists of tables and journals that keep all of the immutable histories of changes in the table.
  • Tables :
    • Contains a collection of document revisions.
  • Journal :
    • An immutable transactions log where transactions are appended as a sequence of blocks that are cryptographically chained together to provide a secure verification and immutability of the history of changes to your ledger data.
    • Only the data’s history of change cannot be altered and not the data itself.
  • Current State
    • The current state is similar to a traditional database where you can view and query the latest data.
  • History :
    • The history is a table where you can view and query the history of all the data and every change ever made to the data.

Performance :

  • Amazon QLDB can execute 2 – 3X as many transactions than ledgers in common blockchain frameworks.

Scalability :

  • Amazon QLDB automatically scales based on the workloads of your application.

Reliability :

  • Multiple copies of QLDB ledger are replicated across availability zones in a region. You can still continue to operate QLDB even in the case of zone failure.
  • Ensures redundancy within a region.
  • Also ensures full recovery when an availability zone goes down.

Backup and Restore :

  • You can export the contents of your QLDB journals to S3 as a backup plan.

Security :

  • Amazon QLDB uses SHA-256 hash function to make a secure file representation of your data’s change history called digest. The digest serves as a proof of your data’s change history, enabling you to go back at a point in time to verify the validity and integrity of your data changes.
  • All data in transit and at rest are encrypted by default.
  • Uses AWS-owned keys for encryption of data.
  • The authentication is done by attaching a signature to the HTTP requests. The signature is then verified using the AWS credentials.
  • Integrated with AWS Private Link.

Pricing :

  • You are billed based on five categories
    • Write I/Os
      • Pricing per 1 million requests
    • Read I/Os
      • Pricing per 1 million requests
    • Journal Storage Rate
      • Pricing per GB-month
    • Indexed Storage Rate
      • Pricing per GB-month
    • Data Transfer OUT From Amazon QLDB To Internet
      • You are charged based on the amount of data transferred per month. The rate varies for different regions.

Limitations :

  • Amazon QLDB does not support Backup and Restore. But you can export your data from QLDB to S3.
  • Does not support Point-in-time restore feature.
  • Does  not support cross-region replication.
  • Does not support the use of customer managed CMKs (Customer Managed Keys).

AMAZON Neptune : Theory

 

  • Amazon Neptune is a fully managed graph database service used for building applications that work with highly connected datasets.
  • Optimized for storing billions of relationships between pieces of information.
  • Provide milliseconds latency when querying the graph.
  • Neptune supports graph query languages like Apache TinkerPop Gremlin and W3C’s SPARQL.
How it works

Amazon Neptune

Common Use Cases

  • Social Networking
    • Amazon Neptune can easily process user’s interactions like comments, follows, and likes in a social network application through highly interactive queries.
  • Recommendation Engines
    • You can use Amazon Neptune to build applications for suggesting personalized and relevant products based on relationships between information such as customer’s interest and purchase history.
  • Knowledge Graphs
    • With the help of Amazon Neptune, you can create a knowledge graph for search engines that will enable users to quickly discover new information. 
  • Identity Graphs
    • You can use Amazon Neptune as a graph database to easily link and update user profile data for ad-targeting, personalization, and analytics. 

Performance

  • Supports 15 read replicas and 100,000s of queries per second.
  • Amazon Neptune uses query optimization for both SPARQL queries and Gremlin traversals.

Reliability

  • Database volume is replicated six ways across three availability zones.
  • Amazon Neptune can withstand a loss of up to two copies of data and three copies of data without affecting write availability and read availability respectively.
  • Amazon Neptune’s storage is self-healing. Data blocks are continuously scanned for errors and replaced automatically.
  • Amazon Neptune uses asynchronous replication to update the changes made to the primary instance to all of Neptune’s read replicas.
  • Replicas can act as a failover target with no data loss.
  • Supports automatic failover.
  • Supports promotion priority within a cluster. Amazon Neptune will promote the replica with the highest priority tier to primary when the primary instance fails.

 

Cluster Volume

Local Storage

STORED DATA TYPE

Persistent data

Temporary data

SCALABILITY

Automatically scales out when more space is required

Limited to the DB Instance class

Backup And Restore

  • Automated backups are always enabled.
  • Supports Point-In-Time restoration, which can be up to 5 minutes in the past.
  • Supports sharing of encrypted manual snapshots.

Security

  • Amazon Neptune supports AWS Key Management Service ( KMS ) encryption at rest.
  • It also supports HTTPS connection. Neptune enforces a minimum version of TLS v1.2 and SSL client connections to Neptune in all AWS Regions where Neptune is available.
  • To encrypt an existing Neptune instance, you should create a new instance with encryption enabled and migrate your data into it.
  • You can create custom endpoints for Amazon Neptune to access your workload. Custom endpoints allow you to distribute your workload across a designated set of instances within a Neptune cluster.
  • Offers database deletion protection.

Pricing

  • You are billed based on the DB instance hours, I/O requests, storage, and Data transfer.
  • Storage rate and I/O rate is billed in per GB-month increments and per million request increments respectively.

Monitoring

  • Visualize your graph using the Neptune Workbench.
  • You can receive event notifications on your Amazon Neptune DB clusters, DB instances, DB cluster snapshots, parameter groups, or security groups through Amazon SNS.

Limitations

  • It does not support cross-region replicas.
  • Encryption of an existing Neptune instance is not supported.
  • Sharing of automatic DB snapshots to other accounts is not allowed. A workaround for this is to manually copy the snapshot from the automatic snapshot, then, copy the manual snapshot to another account.