Sunday, 3 July 2022

AWS Batch Theory

AWS Batch :

 AWS Batch is a set of batch management capabilities that enable developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and different types of computing resources, such as CPU or memory-optimized compute resources, based on the volume and specific resource requirements of the batch jobs submitted.

With AWS Batch, there is no need to install and manage batch computing software or server clusters, instead, you focus on analyzing results and solving problems. AWS Batch plans, schedules, and executes your batch computing workloads using Amazon EC2, available with spot instance, and AWS compute resources with AWS Fargate or Fargate Spot.

Features of AWS Batch

The features of Amazon Batch are:

1.Dynamic Compute Resource Provisioning and Scaling

When AWS Batch is used with Farfate or Fargate spot, you will only need to set up a few concepts such as CE, job queue, and job definition. Now, you have a complete queue, scheduler, and compute architecture but you need not worry about managing a single piece of computing infrastructure.

2.AWS Batch with Fargate

When Fargate resources are used with AWS Batch, it allows you to have a completely serverless architecture for the batch jobs you need. Every batch receives the same exact amount of CPU and memory for the requests when you are dealing with Fargate. So, you will not have any wasted resource time and you also need not wait for any EC2 instance launches.

3.Priority-based Job Scheduling

One of the main features of AWS Batch is you can set up a number of queues with different priority levels. Unless the compute resources are available to execute the next job, the batch jobs are stored in queues. The AWS Batch scheduler is responsible for deciding when, where, and how to run the batch jobs that have already been submitted to a queue based on the resource requirements of the job.

4.Support for GPU Scheduling

AWS Batch supports GPU scheduling. It allows you to specify the number and type of accelerator that your jobs require as job definition input variables in AWS Batch. AWS Batch will scale up instances appropriate for your jobs based on the required number of GPUs and isolate the accelerators according to each job’s needs so that only the appropriate containers can access them.

5.Support for Popular Workflow Engines

AWS Batch supports and is integrated with the open-source and commercial workflows and languages such as Pegasus WMS, Luigi, Nextflow, Metaflow, Apache Airflow, and AWS Step Functions. This will enable you to use simple workflow languages to model your batch compute pipeline.

6.Integrated Monitoring and Logging

AWS Batch displays key operational metrics for your batch jobs in AWS Management Console. You can view metrics related to computing capacity as well as running, pending, and completed jobs. Logs for your jobs, e.g., STDERR and STDOUT, are available in AWS Management Console; the logs are also written Amazon CloudWatch Logs.

7.Support for Tightly-coupled HPC Workloads

AWS Batch supports multi-node parallel jobs, which enables you to run single jobs that span multiple EC2 instances. This feature lets you use AWS Batch to easily and efficiently run workloads such as large-scale, tightly-coupled high-performance computing (HPC) applications or distributed GPU model training.

Comparison between AWS Batch and AWS Lambda

AWS Batch
It allows developers, scientists, and engineers to run hundreds of thousands of batch computing operations on AWS quickly and easily. Based on the volume and specific resource requirements of the batch jobs submitted, it dynamically provisions the optimal quantity and kind of compute resources (e.g., CPU or memory optimized instances).

Pros

  • Scalable
  • Containerized

Cons

  • More overhead than lambda
  • Image management

AWS Lambda
AWS Lambda is a compute service that automatically maintains the underlying computing resources for you while running your code in response to events. You may use AWS Lambda to add custom logic to other AWS services or build your own back-end services that run on AWS scale, performance, and security.

Pros

  • Stateless
  • No deploy, no server, great sleep
  • Easy to deploy
  • Extensive API
  • VPC Support

Cons

  • Can’t execute ruby or go
  • Can’t execute PHP w/o significant effort

Use cases of AWS Batch:

Financial Services: Post-trade Analytics

The use case of this is to automate the analysis of the day’s transaction costs, execution reporting, and market performance. This is achieved by:

  • Firstly, send data, files, and applications to Amazon S3 where it sends the post-trade data to AWS object storage.
  • AWS Batch configures the resources and schedules the timing of when to run data analytics workloads.
  • After that, you have to run big data analytics tools for appropriate analysis and market predictions for the next day’s trading.
  • Then, the next big step is to store the analyzed data for long-term purposes or even for further analysis.

Life Sciences: Drug Screening for Biopharma

The main purpose of this use case is to rapidly search libraries of small molecules for drug discovery. This is done by:

  • Firstly, the AWS data or files are sent to Amazon S3. which further sends the small molecules and drug targets to AWS.
  • The AWS batch then configures the given resources and schedules the timing of when to run high-throughput screens.
  • After scheduling, big data will complete your compound screening jobs based on your AWS batch configuration.
  • The results are again stored for further analysis.

Digital Media: Visual Effects Rendering

The main purpose of this use case is to automate content rendering workloads and reduce the need for human intervention due to execution dependencies or resource scheduling. This is achieved by:

  • Firstly, graphic artists create a blueprint for the work that they have done.
  • They schedule render jobs in the pipeline manager.
  • After that, they submit the render jobs to AWS Batch. Then, AWS Batch will prefetch content from S3 or any other location.
  • In the next step, they either launch the distributed job across the render farm effectively managing the dependencies or they manage the license appropriately.
  • The final step is to post write back to Amazon S3 or output location.

Benefits of AWS Batch

  • Fully managed: The Batch traces user requests as they travel through your entire application. Since this request aggregates the data generated by various services and resources in your application, you have a chance to get an end-to-end view of how your application is proceeding.
  • Ready to use with AWS: It works with many services in AWS. Amazon Batch can be integrated with Amazon EC2, EC2 Container Service, Lambda, and Amazon Elastic Beanstalk.
  • Cost-optimized resource provisioning: With AWS Batch, you can obtain insights into how your application is performing and find root causes for any issues. With Batch’s tracking feature, you can find where the issues are that are causing performance dips in your application.


Cost of AWS Batch

Now, let us take a look at the pricing of the Batch.

There is no extra charge for AWS Batch. You just need to pay for AWS resources such as EC2 instances, AWS Lambda, and AWS Fargate that you use to create, store, and run your application. You can use your Reserved Instances, Savings Plan, EC2 Spot Instances, and Fargate with AWS Batch by specifying your compute-type requirements when setting up your AWS Batch compute environments.

Elastic Bean Stalk Practical

 Step 1: On Elastic Beanstalk console click on Create New Application option. A dialog box appears where you can give a name and appropriate description for your application.




Step 2: Now that the application folder has been created, you can click on the Actions tab and select Create Environment option. Beanstalk provides you with an option where you can create multiple Environments for your application.




Step 3: Choose among two different Environment Tier options. Choose PHP  Environment if you want your application to handle PHP requests or choose Worker Environment to handle background task










Elastic Bean Stalk Theory

 

AWS Elastic Beanstalk Components

There are certain key concepts which you will come across frequently when you deploy an application on Beanstalk. Let us have look at those concepts: 

Application:

  • An application in Elastic Beanstalk is conceptually similar to a folder
  • An application is a collection of components including environments, versions and environment configuration

Application Version:

  • An application version refers to a specific, labeled iteration of deployable code for a web application
  • An application version points to an Amazon S3 object that contains the deployable code such as a Java WAR file

Environment:

  • Environments within Elastic Beanstalk Application is where the current version of the application will be active
  • Each environment runs only a single application version at a time. But it is possible to run same or different versions of an application in many environments at the same time

Environment Tier:

Based on requirement beanstalk offers two different Environment tiers: Web Server Environment, Worker Environment


  • Web Server Environment: Handles HTTP requests from clients
  • Worker Environment: Processes background tasks which are resource consuming and time intensive

Here is an illustration to show how Application, Application version and Environments relate to each other:

AWS- Elastic-Beanstalk-Application-EdurekaAnd here is how Beanstalk Environment using default container type looks like:

AWS-Elastic-Beanstalk-Environment-Edureka
Now that you know about different key concepts pertaining to Elastic Beanstalk, let understand the architecture of Elastic Beanstalk.

AWS Elastic Beanstalk Architecture

Before getting into AWS Elastic Beanstalk architecture, let’s answer the most frequently asked question,

What is an Elastic Beanstalk Environment?

Environment refers to the current version of the application. When you launch an Environment for your application, Beanstalk asks you to choose among two different Environment Tiers i.e, Web Server Environment or Worker Environment. Let’s understand them one by one.

Check out our AWS Certification Training in Top Cities

IndiaUnited StatesOther Countries
AWS Training in HyderabadAWS Training in AtlantaAWS Training in London
AWS Training in BangaloreAWS Training in BostonAWS Training in Adelaide
AWS Training in ChennaiAWS Training in NYCAWS Training in Singapore

Web Server Environment

Application version which is installed on the Web Server Environment handles HTTP requests from the client. The following diagram illustrates an example AWS Elastic Beanstalk architecture for a Web Server Environment tier and shows how the components in that type of Environment Tier work together.

Beanstalk Environment – The Environment is the heart of the application. When you launch an Environment, Beanstalk assigns various resources that are needed to run the application successfully.

Elastic Load Balancer – When the application receives multiple requests from a client, Amazon Route53 forwards these requests to the Elastic Load Balancer. The load balancer distributes the requests among EC2 instances of Auto Scaling Group.

Auto Scaling Group – Auto Scaling Group automatically starts additional Amazon EC2 instances to accommodate increasing load on your application. If the load on your application decreases, Amazon EC2 Auto Scaling stops instances, but always leaves at least one instance running.

Host Manager – It is a software component which runs on every EC2 instance that has been assigned to your application. The host manager is responsible for various things like

  • Generating and monitoring application log files
  • Generating instance level events
  • Monitoring application server

Security Groups – Security Group is like a firewall for your instance. Elastic Beanstalk has a default security group, which allows the client to access the application using HTTP Port 80. It also provides you with an option where you can define security groups to the database server as well. The below image summarises what we have learned about Web Server Environment.

Architecture - AWS Elastic Beanstalk - Edureka

So that’s all about Web Server Environment. But what if the application version installed on Web Server Tier keeps denying multiple requests because it has encountered time intensive and resource consuming tasks while handling a request? Well, this is where Worker Tier comes into the picture.

Aws Ec2 Instance Theory

 

What is EC2 Instance?

  • It is a Computing resource that provides a virtual computing environment to deploy your application
  • In short, you can create a server on AWS and deploy your application on that server.

Why Use Amazon EC2?

  • No need for H/W, Develop and deploy the application faster.
  • Pay only for that you use
  • Auto scaling as per the workload.
  • Complete control of servers
  • Built-in security

EC2 Instance Types

  • General Purpose EC2 Instance
    • This type of instance is the most commonly utilised for testing. There are two types of general-purpose instances: “T” and “M.”
    • “T” instances are targeted to simple jobs just like testing environments, and they have a modest networking on the most basic options.
    • “M” Instances are for general use when you don’t want a testing environment, but you want an all-purpose instance. They offer more balanced resources compared to “T” instance
  • Compute Optimized
    • If your application requires to process a lot of information like math operations, load balancing, rendering task or sometimes video encoding
    • You need an instance that can process all that information in less time
  • Memory Optimized
    • If your app doesn’t require too much CPU, but instead, it needs more and faster RAM; you should check out the available option on the “X1e, X1 and R” instances.
  • Accelerated Computing
    • Creating a movie and need to render the textures? Need to design with power? Or you just have money to spend and want to play games on streaming?
  • Storage Optimized
    • This Kind of instances are provisioned with a more significant amount of TB for storage
    • You are going to have the best I/O Performance. These instances are a great option for those databases that need to be writing regularly on the disk, here we have three groups of instances: H, I and D.

Creating an EC2 instance

  1. Sign in to the AWS Management Console.
  2. Click on the EC2 service.
  3. Click on the Launch Instance button to create a new instance.

  4. Choose AMI: (Amazon Machine Image) AMI is a template used to create the EC2 instance.

  5. Choose Instance Type and then click on the Next. Suppose I choose a t2.micro as an instance type for testing purpose

  6. The main setup page of EC2 is shown below where we define setup configuration

  7. Never leave the default 8gb, if you want to be on the free tier limits you can set a value around 20gb -24gb, because sometimes you leave it as default and your instance is not going to have too many spaces to do many things, and click next

  8. Now, Add the Tags and then click on the Next.

  9. Configure Security Group. The security group allows some specific traffic to access your instance. i.e. If you want to have a web server you need to open port 80. If you want ssh access you need port 22, so let’s create a new one.

  10. Review an EC2 instance that you have just configured, and then click on the Launch button.

  11. Create a new key pair and enter the name of the key pair. Download the Key pair.
  12. Click on the Launch Instances button.

AWS EC2 Instance practical

Creating an EC2 instance

  1. Sign in to the AWS Management Console.
  2. Click on the EC2 service.
  3. Click on the Launch Instance button to create a new instance.



4.Choose AMI: (Amazon Machine Image) AMI is a template used to create the EC2 instance.





5.Choose Instance Type and then click on the Next. Suppose I choose a t2.micro as instance type for testing purpose.
6.The main setup page of EC2 is shown below where we define setup configuration.

                                                                                                                                     


7.Never leave the default 8gb, if you want to be on the free tier limits you can set a value around 20gb -24gb, because sometimes you leave it as default and your instance is not going to have too many spaces to do many things, and click next.

8.Now, Add the Tags and then click on the Next.




9.onfigure Security Group. The security group allows some specific traffic to access your instance. i.e. If you want to have a web server you need to open port 80. If you want ssh access you need port 22, so let’s create a new one.







10.Review an EC2 instance that you have just configured, and then click on the Launch button.







11.Create a new key pair and enter the name of the key pair. Download the Key pair.




12.Click on the Launch Instances button.













AWS S3 Theory

What is AWS S3?

Amazon Simple Storage Service (S3) is a storage for the internet. It is designed for large-capacity, low-cost storage provision across multiple geographical regions. Amazon S3 provides developers and IT teams with SecureDurable and Highly Scalable object storage.

S3 is Secure because AWS provides:

  • Encryption to the data that you store. It can happen in two ways:
    • Client Side Encryption
    • Server Side Encryption
  • Multiple copies are maintained to enable regeneration of data in case of data corruption
  • Versioning, wherein each edit is archived for a potential retrieval.

S3 is Durable because:

  • It regularly verifies the integrity of data stored using checksums e.g. if S3 detects there is any corruption in data, it is immediately repaired with the help of replicated data.
  • Even while storing or retrieving data, it checks incoming network traffic for any corrupted data packets.

S3 is Highly Scalable, since it automatically scales your storage according to your requirement and you only pay for the storage you use.

The next question which comes to our mind is,

What kind and how much of data one can store in AWS S3?

You can store virtually any kind of data, in any format, in S3 and when we talk about capacity, the volume and the number of objects that we can store in S3 are unlimited.

*An object is the fundamental entity in S3. It consists of data, key and metadata.

When we talk about data, it can be of two types-

  • Data which is to be accessed frequently.
  • Data which is accessed not that frequently.

Therefore, Amazon came up with 3 storage classes to provide its customers the best experience and at an affordable cost.

Let’s understand the 3 storage classes with a “health-care” use case:

1.Amazon S3 Standard for frequent data access 
standard storage - aws s3 tutorial - edurekaThis is suitable for performance sensitive use cases where the latency should be kept low. e.g. in a hospital, frequently  accessed data will be the data of admitted patients, which should be retrieved quickly.

 

2. Amazon S3 Standard for infrequent data access

This is suitable for use cases where the data is long lived and less frequently accessed, i.e for data archival but still expects high performance. e.g. in the same hospital, people who have been discharged, their records/data will not be needed on a daily basis, but if they return with any complication, their discharge summary should be retrieved quickly.

3.Amazon Glacier
Glacier - aws s3 tutorial - edureka
 Suitable for use cases where the data is to be archived, and high performance is not required, it has a lower cost than the other two services.e.g. in the hospital, patients’ test reports, prescriptions, MRI, X Ray, Scan docs etc. that are older than a year will not be needed in the daily run and even if it is required, lower latency is not needed.

Specification Snapshot: Storage Classes

s3 storage classes - aws s3 tutorial - edureka

How is data organized in S3?

Data in S3 is organized in the form of buckets.

Bucket s3 - aws s3 tutorial - edureka

  • A Bucket is a logical unit of storage in S3.
  • A Bucket contains objects which contain the data and metadata.

Before adding any data in S3 the user has to create a bucket which will be used to store objects.

Where is your data stored geographically?

You can self-choose where or in which region your data should be stored. Making a decision for the region is important and therefore it should be planned well.

These are the 4 parameters to choose the optimal region –

  • Pricing
  • User/Customer Location
  • Latency
  • Service Availability

Let’s understand this through an example:

Suppose there is a company which has to launch these storage instances to host a website for the customers in the US and India.

To provide the best experience, the company has to choose a region, which best fits its requirements.

regions - aws s3 tutorial - edureka


Now looking at the above parameters, we can clearly identify, that N Virginia will be the best region for this company because of the low latency and low price. Irrespective of your location, you can select any region which might suit your requirements, since you can access your S3 buckets from anywhere.

Talking about regions, let’s see about the possibility of having a backup in some other availability region or you may want to move your data to some other region. Thankfully, this feature has been recently added to the AWS S3 system and is pretty easy to use.

Cross-region Replication

As the name suggests, Cross-region Replication enables user to either replicate or transfer data to some other location without any hassle.

This obviously has a cost to it which has been discussed further in this article.

CRR - aws s3 tutorial - edureka

How is the data transferred?

Besides traditional transfer practices that is over the internet, AWS has 2 more ways to provide data transfer securely and at a faster rate:

  • Transfer Acceleration
  • Snowball

Cloudfront - aws s3 tutorial - edurekaTransfer Acceleration enables fast, easy and secure transfers over long distances by exploiting Amazon’s CloudFront edge technology.

CloudFront is a caching service by AWS, in which the data from client site gets transferred to the nearest edge location and from there the data is routed to your AWS S3 bucket over an optimised network path. 


The
 Snowball is a way of transferring your data physically. In this Amazon sends an equipment to your premises, on which you can load the data. It has a kindle attached to it which has your shipping address when it is shipped from Amazon. 
When data transfer is complete on the Snowball, Snowball - aws s3 tutorial - edurekakindle changes the shipping address back to the AWS headquarters where the Snowball has to be sent. 

The Snowball is ideal for customers who have large batches of data move. The average turnaround time for Snowball is 5-7 days, in the same time Transfer Acceleration can transfer up to 75 TB of data on a dedicated 1Gbps line. So depending on the use case, a customer can decide.

Obviously, there will be some cost around it, let’s look at the overall costing around S3.

Free on AWS?”

Yes!  As a part of the AWS Free Usage Tier, you can get started with AWS S3 for free. Upon sign up, new AWS customers receive 5 GB of Amazon S3 standard storage, 20,000 Get-Requests, 2,000 Put-Requests, and 15GB of data transfer-out each month for one year.

Over this limit, there is a cost attached, let’s understand how amazon charges you:

How is S3 billed?

Though having so many features, AWS S3 is affordable and flexible in its costing. It works on Pay Per Use, meaning, you only pay what you use. The table below is an example for pricing of S3 for a specific region:

aws s3 billing - aws s3 tutorial - edureka

Source: aws.amazon.com for North Virginia region

Cross Region Replication is billed in the following way:

If you replicate 1,000 1 GB objects (1,000 GB) between regions you will incur a request charge of $0.005 (1,000 requests x $0.005 per 1,000 requests) for replicating 1,000 objects and a charge of $20 ($0.020 per GB transferred x 1,000 GB) for inter-region data transfer. After replication, the 1,000 GB will incur storage charges based on the destination region.

Snowball, there are 2 variants:

  • Snowball 50 TB : 200$
  • Snowball 80 TB:  250$

This is the fixed service fee that they charge.

Apart from this there are on-site, charges which are exclusive of shipping days, the shipping days are free.

The first 10 on-site days are also free, meaning when the Snowball reaches your premises from then, till the day it is shipped back, they are the on-site days. The day it arrives, and the day it is shipped gets counted as shipping days, therefore are free.