Linux Training in Coimbatore & Best Linux Server Administration Training Institute

Sunday, 15 March 2020

Azure SQL Stretch Database & SQL Data Warehouse

SQL Stretch Database

It migrates our cold data transparently and securely to the Microsoft Azure Cloud. Stretch database divides the data into two types. One is the hot data, which is frequently accessed, and the second one is cold data, which is infrequently accessed. Also, we can define policies or criteria for hard data and cold data.

For example - if we have a sales order table, all those open and in Program Sales orders can be hot data, and all the closed sales orders can be cold data. The cold data will be transparently migrated to Azure SQL Stretch Database. However, it doesn't mean that we need to change our application in such a way that for open sales orders, we need to go to Azure SQL Stretch Database.

We can use the same queries in our application to fetch the data and based on the location of data, and the query will be automatically sent to Stretch Database.

Advantages of SQL Stretch Database

It provides cost-effective availability for cold data that benefits from the low cost of Azure rather than scaling expensive on-premises storage.
It doesn't require changes to the existing queries or applications. The position of the data is transparent to the application.
It reduces the on-premises maintenance and storage for our data. Backups for our on-premises data run faster and finish within the maintenance window. Backups for the cloud portion of our data run automatically.
It keeps our data secure even during migration. It provides encryption for our data in motion. Row-level security and other advanced SQL Server security feature also work with Stretch Database to protect our data.

SQL Data Warehouse

Microsoft SQL Data Warehouse within Azure is a cloud-based at scale-out database capable of processing massive volume of data, both relational and non-relational and SQL Data Warehouse is based on massively parallel processing architecture.

In this architecture, requests are received by the control node, optimized, and passed on to the compute nodes to do work in parallel. SQL data warehouse stores the data in Premium locally redundant storage, and linked to computing nodes for query extraction.

Components of SQL Data Warehouse

Data Warehouse units: Allocation of resources to our SQL Data Warehouse is measured in Data Warehouse Units (DWUs). DWUs is a measure of underlying resources like CPU, memory, IOPS, which are allocated to our SQL Data Warehouse.

Data Warehouse units provide a measure of three precise metrics that are highly correlated with data warehouse workload performance.

Scan/Aggregation: Scan/Aggregation takes the standard data warehousing query. It scans a large number of rows and then performs a complex aggregation. It is an I/O and CPU intensive operation.
Load: This metric measures the ability to ingest data into the service. This metric is designed to stress the network and CPU aspects of the service.
Create Table As Select (CTAS): CTAS measures the ability to copy a table. It involves reading data from storage, distributing it across the nodes of the appliance, and writing it to storage again. It is a CPU, IO, and network-intensive operation.

Azure Data Factory

Azure Data Factory is a data-integration service based on the Cloud that allows us to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation. Data Factory is a perfect ETL tool on Cloud. Data Factory is designed to deliver extraction, transformation, and loading processes within the cloud. The ETL process generally involves four steps:

Connect & Collect: We can use the copy activity in a data pipeline to move data from both on-premises and cloud source data stores.
Transform: Once the data is present in a centralized data store in the cloud, process or transform the collected data by using compute services such as HDInsight Hadoop, Spark, Data Lake Analytics, and Machine Learning.
Publish: After the raw data is refined into a business-ready consumable form, it loads the data into Azure Data Warehouse, Azure SQL Database, and Azure Cosmos DB, etc.
Monitor: Azure Data Factory has built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Log Analytics, and health panels on the Azure portal.

Components of Data Factory

Data Factory is composed of four key elements. All these components work together to provide the platform on which you can form a data-driven workflow with the structure to move and transform the data.

Pipeline: A data factory can have one or more pipelines. It is a logical grouping of activities that perform a unit of work. The activities in a pipeline perform the task altogether. For example - a pipeline can contain a group of activities that ingests data from an Azure blob and then runs a Hive query on an HDInsight cluster to partition the data.
Activity: It represents a processing step in a pipeline. For example - we might use a copy activity to copy data from one data store to another data store.
Datasets: It represents data structures within the data stores, which point to or reference the data we want to use in our activities as I/O.
Linked Services: It is like connection strings, which define the connection information needed for Data Factory to connect to external resources. A Linked service can be a data store and compute resource. Linked service can be a link to a data store, or a computer resource also.
Triggers: It represents the unit of processing that determines when a pipeline execution needs to be disabled. We can also schedule these activities to be performed at some point in time, and we can use the trigger to disable an activity.
Control flow: It is an orchestration of pipeline activities that include chaining activities in a sequence, branching, defining parameters at the pipeline level, and passing arguments while invoking the pipeline on-demand or from a trigger. We can use control flow to sequence certain activities and also define what parameters need to be passed for each of the activities.

Creating Azure Data-Factory using the Azure portal

Step 1: Click on create a resource and search for Data Factory then click on create.

Step 2: Provide a name for your data factory, select the resource group, and select the location where you want to deploy your data factory and the version.

Step 3: After filling all the details, click on create.

The Azure Data Factory completely had a different portal, as shown in the following figure.

Azure COSMOS Database

Azure Cosmos DB is a NoSQL data store. It is different from the traditional relational database where we have a table, and the table will have a fixed number of columns, and each row in the table should adhere to the scheme of the table. In the NoSQL database, you don't define any schema at all for the table, and each item or row within the table can have different values, or different schema itself.

Advantages of Cosmos DB

No Schema & Index management: The Azure database engine is fully schema-agnostic. Therefore no schema and index management are required. We also don't have to worry about application downtime while migrating schemas.
Industry-leading comprehensive SLAs: Cosmos DB is the first and only service to offer industry-leading full 99.99% high availability, read and write latency at the 99th percentile, guaranteed throughput, and consistency.
The low total cost of Ownership: Since Cosmos DB is a fully managed service, we no longer need to manage and operate complex multi-datacenter deployment, and upgrades of our database software pay for the support, licensing, or operations.
Developing application using NoSQL APIs: Cosmos DB implements Cassandra, MongoDB, Gremlin, and Azure Table Storage wire protocol directly on the service.
Global distribution: Cosmos DB allows us to add or remove any of the Azure regions to our Cosmos account at any time, with a click of a button.

Cosmos Database Structure

Database: We can create one or more Azure Cosmos database under our account. A database is analogous to a namespace, and it is the unit of management for a set of Azure Cosmos containers.

Cosmos Account: the Azure Cosmos account is the basic unit of global distribution and high availability. For globally distributing our data and throughput across multiple Azure regions, we can add or remove Azure regions from our Azure Cosmos at any time.

Container: An Azure Cosmos container is the unit of scalability for both provisioned throughput and storage of items. A container is horizontally partitioned and then replicated across multiple regions.

Global distribution and Partitioning

Cosmos DB works differently from the traditional relational database where we have a table, and all the rows in the table will sit in one physical place. When it comes to Cosmos DB, we will create logical partitions within a container so that we can have a certain amount of items with one partition key and a certain number of items with another partition key. They are called logical partitioning, and each logical partitioning can reside in a physical partition.

A container can contain millions of items, and we can divide these millions of items using partition key and make logical partitions, and each logical partition will reside in a physical partition. This is how the load of the container will be distributed across the board locally and also the data from here.

Types of Consistency

Azure Cosmos Database approaches the data consistency as a spectrum of choices instead of two extremes. Strong compatibility and eventual consistency are at the ends, but there are many consistency choices along the spectrum.

The consistency levels are region-agnostic. The consistency level of our Azure Cosmos account is guaranteed for all read operations regardless of the region from which the reads and writes are served, the number of areas associated with our Azure Cosmos account, or whether our account is configured with a single or multiple write regions.

Request Units

We pay for the throughput we provision, and the storage we consume on an hourly basis with Azure Cosmos DB.
The cost of all the database operations is normalized by Azure Cosmos DB and is expressed in terms of Request Units (RUs). The price to read a 1-KB item is 1 Request Unit (1 RU). All other database operations are similarly assigned with a cost in terms of RUs.
The number of RU's consumed will depend on the type of operation, item size, data consistency, query patterns, etc.

For the management and planning of capacity, Azure Cosmos DB ensures that the number of RUs for a given database operation over a given dataset is deterministic.

Creating Azure Cosmos DB using Azure portal

Step 1: Click on create a resource and search for Azure Cosmos DB. After that, click on create.

Step 2: Fill-in all the details and click on a review to see if any details are missing.

Step 3: Configure the network for the Azure Cosmos DB.

Step 4: Finally, click on the create button to create your COSMOS database.

Step 6: You will see the following window after the successful deployment of Azure Cosmos DB.

Azure SQL Managed Instance

The Azure SQL Database Managed Instance is a new implementation model of Azure SQL Database based on the VCore-based purchasing model.

Advantages of using Managed Instance

Easy lift and shift: Customers can lift and shift their on-premises SQL server to a Managed Instance that offers compatibility with SQL Server on-premises.

Fully managed PaaS: Azure SQL Database Managed Instance is designed for customers looking to migrate a large number of apps from on-premises self-built or ISV provided an environment to fully managed PaaS cloud environment.

New Business model: Competitive, transparent, and frictionless business model

Security: Managed Instance that offers compatibility with SQL Server on-premises and complete isolation of customer instances with native VNet support.

Managed Instance security isolation

Managed Instance provides additional security isolation from other tenants in the Azure cloud.

The managed instance security isolation includes:

Native virtual network implementation and connectivity to our on-premises environment using Azure Express Route or VPN Gateway.
SQL endpoint is exposed only through a private IP address, allowing safe connectivity from private Azure or hybrid networks.
It is a Single-tenant environment with dedicated underlying infrastructure (compute, storage).

Structure of Managed Instance

When we create a managed instance, a virtual network will get created. It will have front end subnet, Gateway subnet, a managed instance subnet, and the node that we can deploy as part of managed instance creation will get implemented into the MI subnet. Each node consists of the SQL engine and SQL management. Within the same network, we can deploy multiple nodes also, and these various nodes will form a virtual cluster with Gateway servers.

The entire virtual cluster will have two endpoints. The first endpoint will be for client connections, and the second endpoint is public but will be used by Microsoft to manage this environment. They need to connect to this environment using some automated script or something like that and maintain it for that purpose.

There is an endpoint also for this entire environment to work correctly. It needs to connect to Azure storage and service bus also. So, when we are trying to restrict the traffic from our MI subnet to the outside. Make sure we allow all the traffic related to Microsoft otherwise, our environment might not work correctly.

Finally, in terms of client connections and applications to connect to the database, they can reside in Frontend subnet and connect to the database, or they can live in a peered network when we peer the network which our MI subnet then all the web apps or virtual machines can be able to connect to the database because both networks have peered. We can also join our on-premises applications to the database by creating either a virtual network gateway or express gateway.

All the connections, whether it forms the web apps, or the virtual machines, or on-premises applications, all of them are communicating with the database over a private connection.

Azure SQL Database Configuration

We'll see here the key configuration features of the Azure SQL Server and SQL database. In terms of Azure SQL database configuration, the first key thing is Firewall rules at a server level.

Firewall Rules

At a logical server within Azure, we can define some firewall rules. It can be IP rules. IP rules will grant access to the database based on the originating IP address of each request. And the second type of rule is the virtual network rule. It is based on virtual network service endpoints.

Rules for Azure SQL databases can be defined at two levels:

Server level firewall rules: These firewall rules enable clients to access our entire Azure SQL server, i.e., each database within a similar logical server. These firewall rules will be stored in the master database. Server-level firewall rules can be configured by using the portal or by using Transact-SQL statements.
Database-level firewall rules: These rules enable clients to access certain (secure) databases within the same logical server. We can create these rules for each database (including the master database), and they are stored in the individual databases.

Configuring Firewall rules in Azure portal

Step 1: Go to the firewall setting in your database server that you have already created. After that, click on Add Client.

Step 2: Now, click on Add existing virtual network and fill the required details, as shown in the figure below.

Step 3: Finally click on save, you will get the notification that your firewall rules got updated.

Geo-Replication

It is defined at a database level, not server level, and it is designed as a business continuity solution that allows the application to perform quick disaster recovery of individual databases in case of a regional disaster or large scale outage.

When we are configuring geo-replication, we specify a secondary database at a location far away from the primary location. We can have a traffic manager that routes the traffic by default to our primary load balancer and that the primary load balancer is based on the application request. If it is read and write, then it can route to a primary logical server. If it is 'read-only', it can route to a secondary server. Thereby the advantage of geo-replication is that we can offload some of the read-only traffic from primary and route to secondary.

The primary performance will be good because read-only queries will consume a certain amount of CPU or DTU units installed that we have a secondary database where the data continuously get replicated.

Configuring Geo-Replication using Azure portal

Step 1: Click on the Geo-Replication option; you will see the following window.

Step 2: Now, select the location where you want to replicate your database. You can choose multiple locations.

Step 3: Now, create a SQL server for the place where you want to replicate your data.

Step 4: Your server has been created and replicated successfully.

Step 5: You can see in the following figure, where the servers are replicated.

Failover Groups

Auto-failover group is a feature of the SQL database that allows us to manage replication and failover of a group of databases on a logical server or all databases in a Managed Instance to another region.

We can initiate failover manually, or we can delegate it to SQL Database service based on a user-defined policy. When we are using auto-failover groups with automatic failover policy, any outage that impacts one or many of the databases in the group results in automatic failover. It allows the read-write SQL application to transparently reconnect to the primary database when the database change after failover.

Database backups

The SQL database uses SQL server technology to create full, differential, and transaction log backups for Point-in-time Restore (PITR). The transaction log backups generally occur every 5-10 minutes, and differential backups occur typically every 12 hours, with the frequency based on the compute size and amount of database activity. Each SQL database has a default backup retention period between 7 and 35 days that depends on the purchasing model and service tier.

Long-term backup retention (LTR) leverages the full database backups that are automatically created to enable point-time restore. These backups are copied to different storage blobs if the LTR policy is configured. We can set an LTR policy for each SQL database and specify how frequently we need to copy the backups to the long-term storage blobs.

Azure SQL Database

SQL database is the flagship product of Microsoft in the database area. It is a general-purpose relational database that supports structures like relation data - JSON, spatial, and XML. The Azure platform fully manages every Azure SQL Database and guarantees no data loss and a high percentage of data availability. Azure automatically handles patching, backups, replication, failure detection, underlying potential hardware, software or network failure, deploying bug fixes, failovers, database upgrades, and other maintenance tasks.

There are three ways we can implement our SQL database

Managed Instance: This is primarily targeted towards on-premises customers. In case, if we already have a SQL server instance in our on-premises data-center and you want to migrate that into Azure with minimum changes to our application and the maximum compatibility. Then new will go for the managed instance.
Single database: We can deploy a single database on Azure its own set of resources managed via a logical server.
Elastic pool: We can deploy a pool of databases with a shared set of resources managed via a logical server.

We can deploy the SQL database as an infrastructure as a service. That means we want to use the SQL server on an Azure virtual machine, but in that case, we are responsible for managing the SQL server on that particular Azure virtual machine.

Purchasing model

There are two ways we can purchase the SQL Server on Azure.

VCore purchasing model: The vCore-based purchasing model enables us to independently scale compute and storage resources, match on-premises performance, and optimize price. It also allows us to choose a generation of hardware. It also allows us to use Azure Hybrid Benefit for SQL Server to gain cost savings. Best for the customer who values flexibility, control, and transparency.
DTU model: It is based on a bundled measure on compute, storage, and IO resources. Sizes of the compute are expressed in terms of Database Transaction Units (DTUs) for single databases and elastic Database Transaction Units (eDTUs) for elastic pools. This model is best for customers who want simple, pre-configured resource options.

Azure SQL Database service tiers

General Purpose/ Standard model: It is based on a separation of computing and storage service. This architectural model depends on the high availability and reliability of Azure Premium Storage that transparently copies database files and guarantees for zero data loss if underlying infrastructure failure happens.
Business Critical/ Premium service tier model: It is based on a cluster of database engine processes. Both the SQL database engine process and underlying mdf/ldf files are placed on the same node with locally attached SSD storage providing low latency to our workload. High availability is implemented using technology similar to SQL Server Always On Availability Groups.
Hyperscale service tier model: It is the newest service tier in the vCore-based purchasing model. This tier is a highly scalable storage and computes performance tier that leverages the Azure architecture to scale-out the storage and computes resources for an Azure SQL Database beyond the limits available for the General Purpose and Business Critical service tiers.

SQL database logical server

It acts as a central administrative point for multiple single or pooled database logins, firewall rules, auditing rules, threat detection policies, and failover groups.
It must exist before we can create the Azure SQL database. All databases on a server are created within the same region as the logical server.
The SQL database service makes no guarantees regarding the location of the database in relation to their logical servers and exposes no instance-level access or features.
An Azure database logical server is the parent resource for databases, elastic pools, and data warehouses.

Elastic pools

It is a simple and cost-effective solution for scaling and managing more than one database. The databases inside an elastic pool are on a single Azure SQL Database server and share a group of resources at a fixed price.
We can configure resources for the pool based either on the DTU- based purchasing model or the vCore-based purchasing model.
The size of a pool always depends on the aggregate resource needed for all databases in the pool. It determines the following options:
- The maximum resources utilized in the pool by the databases.
- The maximum storage bytes utilized in the pool by the databases.

Creating an Azure SQL Database using Azure portal

Step 1: Click on create a resource and search for SQL Database. Then click on create.

Step 2: Fill all the required details.

Step 3: Select a server or create a new one, as shown in the figure given below.

Step 4: Now, select the pricing tier by clicking on Compute + Storage, as shown in the figure below.

Step 5: After that, click on Review + Create and create the SQL database for your apps.

Step 6: Your SQL database is now created, now click on the go-to resources to configure additional settings for your database.

Azure Database service

The basic fundamental building block that is available in Azure is the SQL database. Microsoft offers this SQL server and SQL database on Azure in many ways. We can deploy a single database, or we can deploy multiple databases as part of a shared elastic pool.

Azure Database Service Architecture

Microsoft introduced a managed instance that is targeted towards on-premises customers. So, if we have some SQL databases within our on-premises datacentre and we want to migrate that database into Azure without any complex configuration, or ambiguity, then we can use managed instance. Because this is mainly targeted towards on-premises customers who want to lift and share their on-premises database into Azure with the least effort and optimized cost. We can also take advantage of licensing we have within our on-premises data center.

Microsoft will be responsible for maintenance patching and related services. But, in case if we want to go for the IaaS service for the SQL server, then we can deploy SQL Server on the Azure Virtual machine. If the data have a dependency on the underlying platform and we want to log into the SQL Server, in that case, we can use the SQL server on a virtual machine.

We can deploy a SQL data warehouse on the cloud. Azure offers many other database services for different types of databases such as MySQL, Maria DB, and also PostgreSQL. Once we deployed a database into Azure, we need to migrate the data into it or replicate the data into it.

Azure Database Services for Data Migration

The services that are available in Azure, which we can use to migrate the data from our on-premises SQL Server into Azure.

Azure Data Migration Service: It is used to migrate the data from our existing SQL server and database within the on-premises data center into Azure.

Azure SQL data sync: If we want to replicate the data from our on-premises database into Azure, then we can use Azure SQL data sync.

SQL Stretch Database: It is used to migrate cold data into Azure. SQL stretch database is a bit different from other database offerings. It works as a hybrid database because it divides the data into different types - hot and cold. A hot data will be kept in the on-premises data center and cold data in the Azure.

Data Factory

It is used for ETL transformation, extraction loading, etc. Using the data factory, we can even extract the data from our on-premises data center. We can do some conversion and load it into the Azure SQL database. Data Factory is an ETL tool that is offered on the cloud, which we can use to connect to different databases, extract the data, transform it, and load into a destination.

Security

All the databases that are existing in Azure need to be secured, and also we need to accept connections from known origins. For this purpose, all these database services come with firewall rules where we can configure from which particular IP address we want to allow connections. We can define those firewall rules to limit the number of connections and also reduce the surface attack area.

Cosmos DB

Cosmos DB is a NoSQL data store that is available in Azure, and it is designed to be globally scalable and also very highly available with extremely low latency. Microsoft guarantees latency in terms of reading and writes with Cosmos DB. For example - if we have any applications such as IoT, gaming where we get a lot of data from different users spread across globally, then we will go for Cosmos DB. Because Cosmos DB is designed to be globally scalable and highly available due to which our users will experience low latency.

Finally, there are two things, and one is we need to secure all the services. For that purpose, we can integrate all these services with Azure Active Directory and manage the users from Azure Active Directory also. To monitor all these services, we can use the security center. There is an individual monitoring tool too, but Azure security center will keep on monitoring all these services and provide recommendations if something is wrong.