Linux Training in Coimbatore & Best Linux Server Administration Training Institute

Thursday, 15 June 2023

Introduction to AWS Elastic File System(EFS)

AWS Storage Services: AWS offers a wide range of storage services that can be provisioned depending on your project requirements and use case. AWS storage services have different provisions for highly confidential data, frequently accessed data, and the not so frequently accessed data. You can choose from various storage types namely, object storage, file storage, block storage services, backups, and data migration options. All of which fall under the AWS Storage Services list.

AWS Elastic File System: From the aforementioned list, EFS falls under the file storage category. EFS is a file-level, fully managed, storage provided by AWS that can be accessed by multiple EC2 instances concurrently. Just like the AWS EBS, EFS is specially designed for high throughput and low latency applications.

Different Storage Classes in AWS EFS:

Standard storage class:

This is the default storage class for EFS.
The user is only charged for the amount of storage used.
This is recommended for storing frequently accessed files.

Infrequently Accessed storage class(One Zone):

Cheaper storage space.
Recommended for rarely accessed files.
Increased latency when reading or writing files.
The user is charged not only for the storage of files but also charged for read and write operations.

Different Performance Modes in EFS:

General-purpose:

Offers low latency.
Supports a maximum of 7000 IOPS.
As a cloudwatch metric, you can view the amount of IOPS your architecture uses and can switch to Max IOPS if required.

Max I/O:

This is recommended when EFS needs over 7000 IOPS
Theoretically, this mode has an unlimited I/O speed.

Different Throughput Modes in EFS:

Burst Mode: Allows 100MBPS of burst speed per TB of storage.
Provisioned Mode: Users can decide the max burst speed of the EFS but are charged more when speeds go beyond the default limit.

Connecting to EFS:

Create an EFS from the AWS console. Choose the correct VPC and configuration that suits your use case.

Create one or more EC2 servers from the EC2 dashboard as needed for your use case.

Allow the EC2 security group to access EFS.
Connect To EFS from your EC2 servers. Primarily there are 2 methods of connecting to EFS from EC2 servers:
- Linux NFS Client: This is the old traditional method of connecting to file systems.
- EFS Mount Helper: This is the AWS recommended and simpler solution to connect to EFS.

Once you have connected to AWS EFS from your EC2 instances you will have a folder of any name (say EFS-Folder) which will hold all the files in the EFS. Any file created in this directory can be seen or edited from any EC2 instances that have access to the EFS.

Features of AWS EFS:

Storage capacity: Theoretically EFS provides an infinite amount of storage capacity. This capacity grows and shrinks as required by the user.
Fully Managed: Being an AWS managed service, EFS takes the overhead of creating, managing, and maintaining file servers and storage.
Multi EC-2 Connectivity: EFS can be shared between any number of EC-2 instances by using mount targets.
- Note-: A mount target is an Access point for AWS EFS that is further attached to EC2 instances, allowing then access to the EFS.
Availability: AWS EFS is region specific., however can be present in multiple availability zones in a single region.
- EC-2 instances across different availability zones can connect to EFS in that zone for a quicker access
EFS LifeCycle Management: Lifecycle management moved files between storage classes. Users can select a retention period parameter (in number of days). Any file in standard storage which is not accessed for this time period is moved to Infrequently accessed class for cost-saving.
- Note that the retention period of the file in standard storage resets each time the file is accessed
- Files once accessed in the IA EFS class are them moved to Standard storage.
- Note that file metadata and files under 128KB cannot be transferred to the IA storage class.
- LifeCycle management can be turned on and off as deemed fit by the users.
Durability: Multi availability zone presence accounts for the high durability of the Elastic File System.
Transfer: Data can be transferred from on-premise to the EFS in the cloud using AWS Data Sync Service. Data Sync can also be used to transfer data between multiple EFS across regions.\

The above image shows an Elastic File System shared between two instances which are further connected to their own EBS volumes. The following are some use cases of EFS:

Multiple server architectures: In AWS only EFS provides a shared file system. So all the applications that require multiple servers to share one single file system have to use EFS.
Big Data Analytics: Virtually infinite capacity and extremely high throughput makes EFS highly suitable for storing files for Big data analysis.
Reliable data file storage: EBS data is stored redundantly in a single Availability Zone however EFS data is stored redundantly in multiple Availability Zones. Making it more robust and reliable than EBS.
Media Processing: High capacity and high throughput make EFS highly favorable for processing big media files.

Limitations of AWS Elastic File System(EFS):

There are a few limitations to consider when using AWS Elastic File System (EFS):

EFS only supports the Network File System (NFS) protocol, so it can only be mounted and accessed by devices that support NFS.
EFS does not support file locking, so it is not suitable for applications that require file locking for concurrent access.
EFS does not support hard links or symbolic links.
EFS has a maximum file size of 47.9 TB.
EFS has a maximum throughput of 1000 MB/s per file system, and a maximum of 16,000 IOPS per file system.
EFS has a maximum number of files and directories that can be created within a single file system, which is determined by the size of the file system. For example, a 1 TB file system can support up to about 20 million files and directories.
EFS is only available in certain regions, and it is not possible to migrate data between regions.

AWS Disaster Recovery Strategies

Disaster recovery is one of the main requirements of making Cloud architectures today. This disaster may be a production bug, fault made by the developers, or maybe a flaw at the end of AWS Service itself. Disaster recovery is an essential part of applications. Before diving into AWS Disaster recovery strategies let’s understand some terms related to Disaster Recovery.

Recovery Time Objective (RTO): RTO is the maximum time span in which a service can remain unavailable before being damaging to the business.

Recovery Point Objective (RPO): RPO is the maximum time for which data could be lost if a system goes down.

RTO-RPO Image

In the above example, the system goes down at 2 pm and is recovered to its normal state by 6 pm evening. This means that the Recovery Time Objective for the above situation is 4 hours. Similarly, say that the above scenario takes backup every 2 hours and the last backup is taken for the system was at 12 pm (marked by the green arrow). Since the system went down to This means that the data between 12 pm to 2 pm is lost and only the data or the system state at 12 pm can be recovered. This means that the Recovery Point objective for the above problem is 2 hours.

The choice of your architecture and data backup solution will solely depend upon how much RPO and RTO can your application support without being harmful to your business.

Different disaster recovery strategies

Backup and restore:

In this strategy, you take frequent snapshots of your data stored in EBS volumes and RDS databases and store these snapshots in a reliable storage space like AWS S3. You can regularly create AMIs of your servers to preserve the state of your server. This will preserve all the software and software updates on your server and IAM permissions associated with the server. Backup and Restore basically uses AWS as your virtual tape library. This strategy can not only be done for AWS applications but also for your on-premise applications. AWS Storage Gateway allows you to take and backup snapshots of your local volumes and store these snapshots in AWS S3. This is the slowest of the Disaster recovery strategies and is best used in accordance with other strategies. Storing backup data in AWS Glacier can help further reduce the costs of the strategy.

RTO- High (Example: 10-24 Hrs)
RPO- Depends on the frequency of the backups. Which can be hourly, 3 hourly, 6 hourly, or daily.

Pilot Light:

In this strategy, a minimal version of the production environment is kept running on AWS. This does not mean the entire application scaled down (warm standby) but configuring and running only the core and the most critical components of the production environment. When disaster strikes an entire full-scaled application is rebooted around this running core. Pilot Light is more costly that Backup and Restore as you have some minimal services of AWS running all the time. This strategy also involves provisioning infrastructure using cloud scripts like AWS CloudFormation scripts for an efficient and quick restoration of the system.

RTO- High but less than backup and restore. Example: 5-10 hours.
RPO- Same as RPO for Backup and Restore i.e. depends on the frequency of backups. Even though a minimal core environment is running the data recovery still depends on backups.

Warm Standby:

As the name suggests warm standby strategy involves running an extremely scaled-down, but a full-fledged, fully functional application similar to your production application always running in the cloud. In case of failure or disaster, the warm standby application can be immediately scaled up to serve as the production application. EC2 servers can be left running to a minimal number and server type and can be scaled up to serve as a fully functional application using AWS AutoScaling features. Also, in case of failure, all DNS records and traffic routing tables are changed to point to the standby application rather than the production application. For quickly changing data architects will have to reverse duplicate data from the standby site to the primary site when the primary production environment takes over.

RTO: Lower than Pilot light. Example:< 5 hours.
RPO: Since the last data write to the master-slave Multi-AZ Database.

Multi-Site:

As the name suggests, the multi-site strategy involves running a fully functional version of the production environment as a backup in the cloud. This is a one-to-one copy of your primary application that is typically run in a different Availability Zone or an entirely different region for durability. This is the most expensive of all the DR options as it makes your running costs double for running a single application. The cost overhead is compensated by the smallest RPO and RTO offered by the Multi-Site DR strategy. The RPO timings however may vary from system to system according to their choice of data replication methods (Synchronous and Asynchronous). As soon as failure strikes the developers only have to change DNS records and routing tables to point to the secondary application.

RTO: Lowest of all DR strategies. Example: < 1 hour.
RPO: Lowest of all DR strategies. Choice of data replication affects RPO. The last data is written in a synchronous database.

Cloud Computing is one of the biggest assets to developers and investors out there to make highly efficient, simple applications and still have a cheaper cost structure. Backups in a traditional (non-Cloud) way can be more costly, inefficient, and are prone to hardware issues and manual errors. AWS offers backup strategies, not only for AWS applications but also for your on-premise applications which can leverage AWS to have a backup. Cloud backups provide a lot of benefits over the traditional backup system. Such as:

Low Costs
Fully AWS managed.
Secure and reliable.
No hardware maintenance.
Off-Site backup
Easy to access and test using Cloud Infrastructure.

Amazon RDS – Working with Backups

This article aims to make you aware of “Working with Backups” on Amazon RDS. The prime objective of backups is to create a copy of our data that can be recovered in situations like data failure, data theft, and so on. As we all know, RDS deals with a lot of crucial data and there can be chances of data loss. To avoid such losses RDS has incorporated several backup strategies in RDS for the clients as per their requirements. Let us discuss all of them.

Automated Backups :

As the name suggests, it is the default backup strategy by RDS, from the time period you created the database instance till the time it gets deleted, “Automated Backups” remain in action. This backup facility allows the user to recover data from any point in time i.e. automated backups keep track of data every second and the users can track back it whenever they need it. The backup retention period is specified by the user during creating the instance, they can alter it whenever they want to, by default it is one day. Automated backups are applied to those instances only which are in the “Available” state, other states like “Stopped”, “Storage Full” do not support automated backups. And when the instances which already have an automated backup running are copied in the same region then automated backups do not apply to these copied instances as this will only increase the bill amount.

You can check whether the automated backup is enabled or not, if enabled then what is the retention period. Just select the instance and click on it. And under “Availability & Durability” you will find the details. Here is the image attached to refer to.

Now, let us look at the steps involved in “Enabling” automated backups for any desired DB instance.

After logging into your account go to the RDS management console. From the navigation pane, select “Databases” and then choose the database you want to enable automated backups for. And click on “Modify”. Here is the image to refer to for any confusion.

After a while “Modify DB Instance” page appears, for the backup retention period select a value other than zero (0). Choose to continue and select “Apply Immediately”. The image is attached ahead for reference.

Let us look at another backup strategy in Amazon RDS.

Snapshots:

Snapshots are another backing-up privilege by Amazon RDS for their users. Snapshots are “non-editable” backups of entire database instances, not individual databases. It is not automatic, but the final snapshot is created automatically without the user’s permission while deleting that instance. A snapshot does not come with a retention period, and they never expire. Snapshots are an efficient method for storing backups within the same region or a different region. We can export the snapshot’s data to Amazon S3 for storing. Snapshots come with multiple sub-services like creating, deleting, exporting, and so on. For knowing about all these services follow these articles.

For creating a DB Snapshot follow this process.

After logging into your account go to the RDS management console. From the navigation pane select “Databases” and then choose the database you want to take snapshot for. And click on “Actions” from the listed options choose “Take Snapshot”. Please refer to the image attached ahead.

In a while, you will see the “Take DB Snapshot” window. Fill in the name you wish to give to the snapshot and then finally click on “Take Snapshot”. The image is attached ahead for better understanding.

In this way, we can easily take a snapshot of any DB instance in RDS. Both the backup strategies are distinctive to each other in terms of their architecture. And if you also use a free tier account then make sure you delete all the services and instances before logging out of your AWS account.

What is Amazon Glacier?

AWS offers a wide range of storage services that can be provisioned depending on your project requirements and use case. AWS storage services have different provisions for highly confidential data, frequently accessed data, and the not so frequently accessed data. You can choose from various storage types namely, object storage, file storage, block storage services, backups, and data migration options. All of which fall under the AWS Storage Services list.

AWS Glacier: From the aforementioned list, AWS Glacier, is the backup and archival storage provided by AWS. It is an extremely low cost, long term, durable, secure storage service that is ideal for backups and archival needs. In a lot of its operation AWS Glacier is similar to S3, and, it interacts directly with S3, using S3-lifecycle policies. However, the main difference between AWS S3 and Glacier is the cost structure. The cost of storing the same amount of data in AWS Glacier is significantly less as compared to S3. Storage costs in Glacier can be as little as $1 for one petabyte of data per month.

AWS Glacier Terminology

1. Vaults: Vaults are virtual containers that are used to store data. Vaults in AWS Glacier are similar to buckets in S3.

Each Vault has its specific access policies(Vault lock/access policies). Thus providing you with more control over who has what kind of access to your data.
Vaults are region-specific.

2. Archives: Archives are the fundamental entity type stored in Vaults. Archives in AWS Glacier are similar to Objects in S3. Virtually you have unlimited storage capacity on AWS Glacier and hence, can store an unlimited number of archives in a vault.

3. Vault Access Policies: In addition to the basic IAM controls AWS Glacier offers Vault access policies that help managers and administrators have more granular control of their data.

Each vault has its own set of Vault Access Policies.
If either of Vault Access Policy or IAM control doesn’t pass for some user action. The user is not declared unauthorized.

4. Vault Lock Policies: Vault lock policies are exactly like Vault access policies but once set, they cannot be changed.

Specific to each bucket.
This helps you with data compliance controls. For example- Your business administrators might want some highly confidential data to be only accessible to the root user of the account, no matter what. Vault lock policy for such a use case can be written for the required vaults.

Features of AWS Glacier

Given the extremely cheap storage, provided by AWS Glacier, it doesn’t provide as many features as AWS S3. Access to data in AWS Glacier is an extremely slow process.
Just like S3, AWS Glacier can essentially store all kinds of data types and objects.
Durability: AWS Glacier, just like Amazon S3, claims to have a 99.9999999% of durability (11 9’s). This means the possibility of losing your data stored in one of these services one in a billion. AWS Glacier replicates data across multiple Availability Zones for providing high durability.
Data Retrieval Time: Data retrieval from AWS Glacier can be as fast as 1-5 minutes (high-cost retrieval) to 5-12 hours(cheap data retrieval).
AWS Glacier Console: The AWS Glacier dashboard is not as intuitive and friendly as AWS S3. The Glacier console can only be used to create vaults. Data transfer to and from AWS Glacier can only be done via some kind of code. This functionality is provided via:
- AWS Glacier API
- AWS SDKs
Region-specific costs: The cost of storing data in AWS Glacier varies from region to region.
Security:
- AWS Glacier automatically encrypts your data using the AES-256 algorithm and manages its keys for you.
- Apart from normal IAM controls AWS Glacier also has resource policies (vault access policies and vault lock policies) that can be used to manage access to your Glacier vaults.
Infinite Storage Capacity: Virtually AWS Glacier is supposed to have infinite storage capacity.

Data Transfer In Glacier

1. Data Upload:

Data can be uploaded to AWS Glacier by creating a vault from the Glacier console and using one of the following methods:
- Write code that uses AWS Glacier SDK to upload data.
- Write code that uses AWS Glacier API to upload data.
- S3 Lifecycle policies: S3 lifecycle policies can be set to upload S3 objects to AWS Glacier after some time. This can be used to backup old and infrequently access data stored in S3.

2. Data Transfer between regions:

AWS Glacier is a region-specific service. Data in one region can be transferred to another from the AWS console. This cost of suck a data transfer is $0.02.

3. Data Retrieval

As mentioned before, AWS Glacier is a backup and data archive service, given its low cost of storage, AWS Glacier data is not readily available for consumption.

Data retrieval from Glacier can only be done via some sort of code, using AWS Glacier SDK or the Glacier API.
Data Retrieval in AWS Glacier is of three types:
- Expedited:
  - This mode of data retrieval is only suggested for urgent requirements of data.
  - A single expedited retrieval request can only be used to retrieve 250MB of data at max.
  - This data is then provided to you within 1-5 minutes.
  - The cost of expedited retrieval is $0.03 per GB and 0.01 per request.
- Standard:
  - This data retrieval mode can be used for any size of data, full or partial archive.
  - This data is then provided to you within 3-5 hours.
  - The cost of standard retrieval is $0.01 per GB and $0.05 per 1000 requests.
- Bulk:
  - This data retrieval is suggested for mass retrieval of data (petabytes of data).
  - It is the cheapest data retrieval option offered by AWS Glacier
  - This data is then provided to you within 5-12 hours.
  - The cost of bulk retrieval is 0.0025 per GB and 0.025 per 1000 requests