Thursday, 15 June 2023

Difference Between AWS S3 and AWS EBS

 The AWS S3(simple storage service ) and AWS EBS(elastic block storage) are two different types of storage services provided by Amazon Web Services. This article highlights some major differences between Amazon S3 and Amazon EBS.

AWS Storage Options:

Amazon S3: Amazon S3 is a simple storage service offered by Amazon and it is useful for hosting website images and videos, data analytics, etc. S3 is an object-level data storage that distributes the data objects across several machines and allows the users to access the storage via the internet from any corner of the world.

Amazon EBS: Unlike Amazon S3, Amazon EBS is a block-level data storage offered by Amazon. Block storage stores files in multiple volumes called blocks, which act as separate hard drives, and this storage is not accessible via the internet. Use cases include business continuity, transactional and NO SQL database, software testing, etc.

Comparison based on Characteristics:

1. Storage type

Amazon Simple Storage Service is object storage designed for storing large numbers of user files and backups whereas Elastic block storage is block storage for Amazon EC2 compute instances and it is just similar to hard drives attached to your computers or laptops, but the only difference is that it is used for virtualized instances.

2. Accessibility

The files within an S3 bucket are stored in an unstructured manner and can be retrieved using HTTP protocols and even with BitTorrent but the data stored in EBS is only accessible by the instance to which it is connected to.

3. Availability

Both S3 and EBS gives the availability of 99.99%, but the only difference that occurs is that S3 is accessed via the internet using API’s and EBS is accessed by the single instance attached to EBS.

4. Durability

Amazon S3 provides durability by redundantly storing the data across multiple Availability Zones whereas EBS provides durability by redundantly storing the data in a single Availability Zone.

5. Security, Compliance, and Audit features

Amazon S3 can prevent unauthorized accessing of data using its access management tools and encryption policies but no such feature is present in EBS. In EBS, if any user gets unauthorized access to the instance then he/she can easily access the attached EBS. Also, S3 has got some features that make it easier to comply with regulatory requirements.

6. Size of data

Simple storage service (S3) can store large amounts as compared to EBS. With S3, the standard limit is of100 buckets and each bucket has got an unlimited data capacity whereas EBS has a standard limit of 20 volumes and each volume can hold data up to 1TB. In EBS there occurs an upper limit on the data storage.

7. Usability

One major limitation of EBS(Elastic block storage) is that not all EBS types can be used by multiple instances at a single time. Multi-attach EBS volume option is available only for provisioned IPOPs SSD io1 and io2 volume types whereas S3 can have multiple images of its contents so it can be used by multiple instances at the same time.

8. Pricing

Amazon S3 storage service allows you to follow the utility-based model and prices as per your usage but in Elastic block storage, you need to pay for the provisioned capacity.

Amazon S3 cost parameters are:

  1. Free Tier – 5 GB
  2. First 50 TB/month – $0.023/GB
  3. 450 TB/month – $0.022/GB
  4. Over 500 TB / Month – $0.021 per GB

Amazon EBS cost parameters are:

  1. Free Tier – 30 GB, 1 GB snapshot storage
  2. General Purpose – $0.045/GB(1 month)
  3. Provisioned SSD – $0.125/GB(1 month)

9. Scalability

Amazon S3 offers rapid scalability to its users/clients, resources can be provisioned and de-provisioned in run time but no such scalability feature is present in EBS, here there is manual increasing or decreasing of storage resources.

10. Performance

  • Amazon EBS is faster storage and offers high performance as compared to S3.

11. Latency

  • As EBS storage is attached to the EC2 instance and is only accessible via that instance in the particular AWS region, it offers less latency than S3 which is accessed via the internet. Also, EBS uses SSD volumes which offers reliable I/O performance.

12. Geographic interchangeability

Amazon EBS has an upper hand in geographical interchangeability of data as here the user just requires EBS snapshots and then he/she can place resources and data in multiple locations.

13. Backup and restore

For backup purposes, Amazon S3 uses versioning and cross-region replication whereas the backup feature in EBS is supported by snapshots and automated backup.

14. Security

Both S3 and EBS support data at rest and data in transmission encryption, so both are equally secure and offer a good level of security.

Use cases:,

Amazon S3 use cases are:

  1. Data lake and big data analytics: Amazon S3 works with AWS Lake Formation to create data lakes that are basically used to hold raw data in its native format and then supports big data analytics by using some machine learning tools,query-in-place, etc and gets some useful insights from the raw data.
  2. Backup and restoration: Amazon S3 when combined with other AWS offerings(EBS, EFS, etc) can provide a secure and robust backup solution.
  3. Reliable disaster recovery: S3 service can provide reliable data recovering from any type of disaster such as power cut, system failure, human error, etc.
  4. Other use cases include entertainment, media, content management purposes, etc.

Amazon EBS use cases are:

  1. Software Testing and development: Amazon EBS is connected only to a particular instance, so it is best suited for testing and development purposes.
  2. Business continuity: Amazon EBS provides a good level of business consistency as users can run applications in different AWS regions and all they require is EBS Snapshots and Amazon machine images.
  3. Enterprise-wide applications: EBS provides block-level storage, so it allows users to run a wide variety of applications including Microsoft Exchange, Oracle, etc.
  4. Transactional and NoSQL databases: As EBS provides a low level of latency so it offers an optimum level of performance for transactional and NO SQL databases. It also helps in database management.

Let us see the differences in a tabular form as follows:

AWS S3AWS EBS
The AWS S3 Full form is Amazon Simple Storage ServiceThe AWS EBS full form is Amazon Elastic Block Store
AWS S3 is an object storage service that helps the industry in scalability, data availability, security, etc.It is easy to use.
AWS S3 is used to store and protect any amount of data for a range of use cases.It has high-performance block storage at every scale
AWS S3 can be used to store data lakes, websites, mobile applications, backup and restore big data analytics. , enterprise applications, IoT devices, archive etc.It is scalable.
AWS S3 also provides management features It is also used to run relational or NoSQL databases


Introduction to AWS Elastic File System(EFS)

 AWS Storage Services: AWS offers a wide range of storage services that can be provisioned depending on your project requirements and use case. AWS storage services have different provisions for highly confidential data, frequently accessed data, and the not so frequently accessed data. You can choose from various storage types namely, object storage, file storage, block storage services, backups, and data migration options. All of which fall under the AWS Storage Services list.

AWS Elastic File System: From the aforementioned list, EFS falls under the file storage category. EFS is a file-level, fully managed, storage provided by AWS that can be accessed by multiple EC2 instances concurrently. Just like the AWS EBS, EFS is specially designed for high throughput and low latency applications. 

Different Storage Classes in AWS EFS:

Standard storage class:

  • This is the default storage class for EFS.
  • The user is only charged for the amount of storage used.
  • This is recommended for storing frequently accessed files.

Infrequently Accessed storage class(One Zone):

  • Cheaper storage space.
  • Recommended for rarely accessed files.
  • Increased latency when reading or writing files.
  • The user is charged not only for the storage of files but also charged for read and write operations.

 

Different Performance Modes in EFS:

General-purpose:

  • Offers low latency.
  • Supports a maximum of 7000 IOPS.
  • As a cloudwatch metric, you can view the amount of IOPS your architecture uses and can switch to Max IOPS if required.

Max I/O:

  • This is recommended when EFS needs over 7000 IOPS
  • Theoretically, this mode has an unlimited I/O speed.

 

Different Throughput Modes in EFS:

  • Burst Mode: Allows 100MBPS of burst speed per TB of storage.
  • Provisioned Mode: Users can decide the max burst speed of the EFS but are charged more when speeds go beyond the default limit.

 

Connecting to EFS:

  • Create an EFS from the AWS console. Choose the correct VPC and configuration that suits your use case.

 

  • Create one or more EC2 servers from the EC2 dashboard as needed for your use case.

 

  • Allow the EC2 security group to access EFS.
  • Connect To EFS from your EC2 servers. Primarily there are 2 methods of connecting to EFS from EC2 servers:
    • Linux NFS Client: This is the old traditional method of connecting to file systems.
    • EFS Mount Helper: This is the AWS recommended and simpler solution to connect to EFS.

 

 

  • Once you have connected to AWS EFS from your EC2 instances you will have a folder of any name (say EFS-Folder) which will hold all the files in the EFS. Any file created in this directory can be seen or edited from any EC2 instances that have access to the EFS.

Features of AWS EFS:

  • Storage capacity: Theoretically EFS provides an infinite amount of storage capacity. This capacity grows and shrinks as required by the user.
  • Fully Managed: Being an AWS managed service, EFS takes the overhead of creating, managing, and maintaining file servers and storage.
  • Multi EC-2 Connectivity: EFS can be shared between any number of EC-2 instances by using mount targets.
    • Note-: A mount target is an Access point for AWS EFS that is further attached to EC2 instances, allowing then access to the EFS.
  • Availability: AWS EFS is region specific., however can be present in multiple availability zones in a single region.
    • EC-2 instances across different availability zones can connect to EFS in that zone for a quicker access
  • EFS LifeCycle Management: Lifecycle management moved files between storage classes. Users can select a retention period parameter (in number of days). Any file in standard storage which is not accessed for this time period is moved to Infrequently accessed class for cost-saving.
    • Note that the retention period of the file in standard storage resets each time the file is accessed
    • Files once accessed in the IA EFS class are them moved to Standard storage.
    • Note that file metadata and files under 128KB cannot be transferred to the IA storage class.
    • LifeCycle management can be turned on and off as deemed fit by the users.
  • Durability: Multi availability zone presence accounts for the high durability of the Elastic File System.
  • Transfer: Data can be transferred from on-premise to the EFS in the cloud using AWS Data Sync Service. Data Sync can also be used to transfer data between multiple EFS across regions.\

The above image shows an Elastic File System shared between two instances which are further connected to their own EBS volumes. The following are some use cases of EFS:

  • Multiple server architectures: In AWS only EFS provides a shared file system. So all the applications that require multiple servers to share one single file system have to use EFS.
  • Big Data Analytics: Virtually infinite capacity and extremely high throughput makes EFS highly suitable for storing files for Big data analysis.
  • Reliable data file storage: EBS data is stored redundantly in a single Availability Zone however EFS data is stored redundantly in multiple Availability Zones. Making it more robust and reliable than EBS.
  • Media Processing: High capacity and high throughput make EFS highly favorable for processing big media files.

Limitations of AWS Elastic File System(EFS):
 

There are a few limitations to consider when using AWS Elastic File System (EFS):

  1. EFS only supports the Network File System (NFS) protocol, so it can only be mounted and accessed by devices that support NFS.
  2. EFS does not support file locking, so it is not suitable for applications that require file locking for concurrent access.
  3. EFS does not support hard links or symbolic links.
  4. EFS has a maximum file size of 47.9 TB.
  5. EFS has a maximum throughput of 1000 MB/s per file system, and a maximum of 16,000 IOPS per file system.
  6. EFS has a maximum number of files and directories that can be created within a single file system, which is determined by the size of the file system. For example, a 1 TB file system can support up to about 20 million files and directories.
  7. EFS is only available in certain regions, and it is not possible to migrate data between regions.

AWS Disaster Recovery Strategies

 Disaster recovery is one of the main requirements of making Cloud architectures today. This disaster may be a production bug, fault made by the developers, or maybe a flaw at the end of AWS Service itself. Disaster recovery is an essential part of applications. Before diving into AWS Disaster recovery strategies let’s understand some terms related to Disaster Recovery.

Recovery Time Objective (RTO): RTO is the maximum time span in which a service can remain unavailable before being damaging to the business.   

Recovery Point Objective (RPO): RPO is the maximum time for which data could be lost if a system goes down.

RTO-RPO Image

RTO-RPO Image

In the above example, the system goes down at 2 pm and is recovered to its normal state by 6 pm evening. This means that the Recovery Time Objective for the above situation is 4 hours. Similarly, say that the above scenario takes backup every 2 hours and the last backup is taken for the system was at 12 pm (marked by the green arrow). Since the system went down to This means that the data between 12 pm to 2 pm is lost and only the data or the system state at 12 pm can be recovered. This means that the Recovery Point objective for the above problem is 2 hours.

The choice of your architecture and data backup solution will solely depend upon how much RPO and RTO can your application support without being harmful to your business.

Different disaster recovery strategies

Backup and restore:

In this strategy, you take frequent snapshots of your data stored in EBS volumes and RDS databases and store these snapshots in a reliable storage space like AWS S3. You can regularly create AMIs of your servers to preserve the state of your server. This will preserve all the software and software updates on your server and IAM permissions associated with the server. Backup and Restore basically uses AWS as your virtual tape library. This strategy can not only be done for AWS applications but also for your on-premise applications. AWS Storage Gateway allows you to take and backup snapshots of your local volumes and store these snapshots in AWS S3. This is the slowest of the Disaster recovery strategies and is best used in accordance with other strategies. Storing backup data in AWS Glacier can help further reduce the costs of the strategy.

  • RTO- High (Example: 10-24 Hrs)
  • RPO- Depends on the frequency of the backups. Which can be hourly, 3 hourly, 6 hourly, or daily.

Pilot Light: 

In this strategy, a minimal version of the production environment is kept running on AWS. This does not mean the entire application scaled down (warm standby) but configuring and running only the core and the most critical components of the production environment. When disaster strikes an entire full-scaled application is rebooted around this running core. Pilot Light is more costly that Backup and Restore as you have some minimal services of AWS running all the time. This strategy also involves provisioning infrastructure using cloud scripts like AWS CloudFormation scripts for an efficient and quick restoration of the system.

  • RTO- High but less than backup and restore. Example: 5-10 hours.
  • RPO- Same as RPO for Backup and Restore i.e. depends on the frequency of backups. Even though a minimal core environment is running the data recovery still depends on backups.

Warm Standby:

As the name suggests warm standby strategy involves running an extremely scaled-down, but a full-fledged, fully functional application similar to your production application always running in the cloud. In case of failure or disaster, the warm standby application can be immediately scaled up to serve as the production application. EC2 servers can be left running to a minimal number and server type and can be scaled up to serve as a fully functional application using AWS AutoScaling features. Also, in case of failure, all DNS records and traffic routing tables are changed to point to the standby application rather than the production application. For quickly changing data architects will have to reverse duplicate data from the standby site to the primary site when the primary production environment takes over.

  • RTO: Lower than Pilot light.  Example:< 5 hours.
  • RPO: Since the last data write to the master-slave Multi-AZ Database.

Multi-Site:

As the name suggests, the multi-site strategy involves running a fully functional version of the production environment as a backup in the cloud. This is a one-to-one copy of your primary application that is typically run in a different Availability Zone or an entirely different region for durability. This is the most expensive of all the DR options as it makes your running costs double for running a single application. The cost overhead is compensated by the smallest RPO and RTO offered by the Multi-Site DR strategy. The RPO timings however may vary from system to system according to their choice of data replication methods (Synchronous and Asynchronous). As soon as failure strikes the developers only have to change DNS records and routing tables to point to the secondary application. 

  • RTO: Lowest of all DR strategies. Example: < 1 hour.
  • RPO: Lowest of all DR strategies. Choice of data replication affects RPO. The last data is written in a synchronous database.

Cloud Computing is one of the biggest assets to developers and investors out there to make highly efficient, simple applications and still have a cheaper cost structure. Backups in a traditional (non-Cloud) way can be more costly, inefficient, and are prone to hardware issues and manual errors. AWS offers backup strategies, not only for AWS applications but also for your on-premise applications which can leverage AWS to have a backup. Cloud backups provide a lot of benefits over the traditional backup system. Such as:

  • Low Costs
  • Fully AWS managed.
  • Secure and reliable.
  • No hardware maintenance.
  • Off-Site backup
  • Easy to access and test using Cloud Infrastructure.

Amazon RDS – Working with Backups

 This article aims to make you aware of “Working with Backups” on Amazon RDS. The prime objective of backups is to create a copy of our data that can be recovered in situations like data failure, data theft, and so on. As we all know, RDS deals with a lot of crucial data and there can be chances of data loss. To avoid such losses RDS has incorporated several backup strategies in RDS for the clients as per their requirements. Let us discuss all of them. 

Automated Backups :

As the name suggests, it is the default backup strategy by RDS, from the time period you created the database instance till the time it gets deleted, “Automated Backups” remain in action. This backup facility allows the user to recover data from any point in time i.e. automated backups keep track of data every second and the users can track back it whenever they need it. The backup retention period is specified by the user during creating the instance, they can alter it whenever they want to, by default it is one day. Automated backups are applied to those instances only which are in the “Available” state, other states like “Stopped”“Storage Full” do not support automated backups. And when the instances which already have an automated backup running are copied in the same region then automated backups do not apply to these copied instances as this will only increase the bill amount. 

You can check whether the automated backup is enabled or not, if enabled then what is the retention period. Just select the instance and click on itAnd under “Availability & Durability” you will find the details. Here is the image attached to refer to.

Now, let us look at the steps involved in “Enabling” automated backups for any desired DB instance.

After logging into your account go to the RDS management console. From the navigation pane, select “Databases” and then choose the database you want to enable automated backups for. And click on “Modify”. Here is the image to refer to for any confusion.

After a while “Modify DB Instance” page appears, for the backup retention period select a value other than zero (0). Choose to continue and select “Apply Immediately”. The image is attached ahead for reference. 

Let us look at another backup strategy in Amazon RDS.

Snapshots:

Snapshots are another backing-up privilege by Amazon RDS for their users. Snapshots are “non-editable” backups of entire database instances, not individual databases. It is not automatic, but the final snapshot is created automatically without the user’s permission while deleting that instance. A snapshot does not come with a retention period, and they never expire. Snapshots are an efficient method for storing backups within the same region or a different region.  We can export the snapshot’s data to Amazon S3 for storing. Snapshots come with multiple sub-services like creatingdeleting, exporting, and so on. For knowing about all these services follow these articles. 

For creating a DB Snapshot follow this process. 

After logging into your account go to the RDS management console. From the navigation pane select “Databases” and then choose the database you want to take snapshot for. And click on “Actions” from the listed options choose “Take Snapshot”. Please refer to the image attached ahead. 

In a while, you will see the “Take DB Snapshot” window. Fill in the name you wish to give to the snapshot and then finally click on “Take Snapshot”. The image is attached ahead for better understanding.

In this way, we can easily take a snapshot of any DB instance in RDS. Both the backup strategies are distinctive to each other in terms of their architecture. And if you also use a free tier account then make sure you delete all the services and instances before logging out of your AWS account.