Sunday, 20 March 2022

AWS DataSync vs Storage Gateway

 


DataSync

Storage Gateway

Description

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates the process of copying large amounts of data to and from AWS storage services over the Internet or over AWS Direct Connect.

AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage by linking it to S3. Storage Gateway provides 3 types of storage interfaces for your on-premises applications: file, volume, and tape.

How it Work

Uses an agent which is a virtual machine (VM) that is owned by the user and is used to read or write data from your storage systems. You can activate the agent from the Management Console. The agent will then read from a source location, and sync your data to Amazon S3, Amazon EFS, or Amazon Fsx for Windows File Server.

Uses a Storage Gateway Appliance – a VM from Amazon – which is installed and hosted on your data center. After the setup, you can use the AWS console to provision your storage options: File Gateway, Cached Volumes, or Stored Volumes, in which data will be saved to Amazon S3.

You can also purchase the hardware appliance to facilitate the transfer instead of installing the VM

Protocols

DataSync connects to existing storage systems and data sources with standard storage protocols (NFS, SMB), or using the Amazon S3 API.

Storage Gateway provides a standard set of storage protocols such as iSCSI, SMB, and NFS.

Storage

AWS DataSync can copy data between Network File Systems (NFS), SMB file servers or self-managed object storages. It can also move data between your on-premises storage and AWS Snowcone, Amazon S3, Amazon EFS, or Amazon FSx,

File Gateway enables you to store and retrieve objects in Amazon S3 using file protocols such as NFS and SMB.

Volume Gateway stores your data locally in the gateway and syncs them to Amazon S3. It also allows you to take point-in-time copies of your volumes with EBS snapshots which you can restore and mount to your appliance as iSCSI device. 

Tape Gateway data is immediately stored in Amazon S3 and can be archived to Amazon S3 Glacier or Amazon S3 Glacier Deep Archive.

Pricing

You are charged standard request, storage, and data transfer rates to read from and write to AWS services, such as Amazon S3, Amazon EFS, AmazonFSx for Windows File Server, and AWS KMS.

You are charged based on the type and amount of storage you use, the requests you make, and the amount of data transferred out of AWS.

Combination

You can use a combination of DataSync and File Gateway to minimize your on-premises’ operational costs while seamlessly connecting on-premises applications to your cloud storage. AWS DataSync enables you to automate and accelerate online data transfers to AWS storage services. File Gateway then provides your on-premises applications with low latency access to the migrated data.

AWS CloudTrail vs Amazon CloudWatch

 

  • CloudWatch is a monitoring service for AWS resources and applications. CloudTrail is a web service that records API activity in your AWS account. They are both useful monitoring tools in AWS.
  • By default, CloudWatch offers free basic monitoring for your resources, such as EC2 instances, EBS volumes, and RDS DB instances. CloudTrail is also enabled by default when you create your AWS account.
  • With CloudWatch, you can collect and track metrics, collect and monitor log files, and set alarms. CloudTrail, on the other hand, logs information on who made a request, the services used, the actions performed, parameters for the actions, and the response elements returned by the AWS service. CloudTrail Logs are then stored in an S3 bucket or a CloudWatch Logs log group that you specify.
  • You can enable detailed monitoring from your AWS resources to send metric data to CloudWatch more frequently, with an additional cost.
  • CloudTrail delivers one free copy of management event logs for each AWS region. Management events include management operations performed on resources in your AWS account, such as when a user logs in to your account. Logging data events are charged. Data events include resource operations performed on or within the resource itself, such as S3 object-level API activity or Lambda function execution activity.
  • CloudTrail helps you ensure compliance and regulatory standards.
  • CloudWatch Logs reports on application logs, while CloudTrail Logs provide you specific information on what occurred in your AWS account.
  • CloudWatch Events is a near real time stream of system events describing changes to your AWS resources. CloudTrail focuses more on AWS API calls made in your AWS account.
  • Typically, CloudTrail delivers an event within 15 minutes of the API call. CloudWatch delivers metric data in 5 minutes periods for basic monitoring and 1 minute periods for detailed monitoring. The CloudWatch Logs Agent will send log data every five seconds by default.

Application Load Balancer vs Network Load Balancer vs Gateway Load Balancer

Application Load Balancer vs Network Load Balancer vs Gateway Load Balancer

Common features between the load balancers:

  • Has instance health check features
  • Has built-in CloudWatch monitoring
  • Logging features
  • Support zonal failover
  • Supports connection draining
  • Support cross-zone load balancing (evenly distributes traffic across registered instances in enabled AZs)
  • Resource-based IAM permission policies
  • Tag-based IAM permissions
  • Flow stickiness – all packets are sent to one target and return the traffic that comes from the same target.

Amazon Simple Workflow (SWF) vs AWS Step Functions vs Amazon SQS

 Amazon Simple Workflow (SWF)

  • A web service that makes it easy to coordinate work across distributed application components.
  • In Amazon SWF, tasks represent invocations of logical steps in applications. Tasks are processed by workers which are programs that interact with Amazon SWF to get tasks, process them, and return their results.
  • The coordination of tasks involves managing execution dependencies, scheduling, and concurrency in accordance with the logical flow of the application.

AWS Step Functions

  • A fully managed service that makes it easy to coordinate the components of distributed applications and microservices using visual workflows.
  • you define state machines that describe your workflow as a series of steps, their relationships, and their inputs and outputs. State machines contain a number of states, each of which represents an individual step in a workflow diagram. States can perform work, make choices, pass parameters, initiate parallel execution, manage timeouts, or terminate your workflow with a success or failure.

Amazon SQS

  • A message queue service used by distributed applications to exchange messages through a polling model, and can be used to decouple sending and receiving components.
  • FIFO (first-in-first-out) queues preserve the exact order in which messages are sent and received. Standard queues provide a loose-FIFO capability that attempts to preserve the order of messages.
Amazon SWF vs AWS Step FunctionsAWS Step Functions vs Amazon SQSAmazon SQS vs AWS SWF
  • Consider using AWS Step Functions for all your new applications, since it provides a more productive and agile approach to coordinating application components using visual workflows. If you require external signals (deciders) to intervene in your processes, or you would like to launch child processes that return a result to a parent, then you should consider Amazon SWF.

  • With Step Functions, you write state machines in declarative JSON. With Amazon SWF, you write a decider program to separate activity steps from decision steps. This provides you complete control over your orchestration logic, but increases the complexity of developing applications. You may write decider programs in the programming language of your choice, or you may use the Flow framework, which is a library for building SWF applications, to use programming constructs that structure asynchronous interactions for you.
  • Use Step Functions when you need to coordinate service components in the development of highly scalable and auditable applications. Use SQS when you need a reliable, highly scalable, hosted queue for sending, storing, and receiving messages between services.

  • Step Functions keeps track of all tasks and events in an application. Amazon SQS requires you to implement your own application-level tracking, especially if your application uses multiple queues.

  • The Step Functions Console and visibility APIs provide an application-centric view that lets you search for executions, drill down into an execution’s details, and administer executions. Amazon SQS requires implementing such additional functionality.

  • Step Functions offers several features that facilitate application development, such as passing data between tasks and flexibility in distributing tasks. Amazon SQS requires you to implement some application-level functionality.

  • You can use Amazon SQS to build basic workflows to coordinate your distributed application, but you get this facility out-of-the-box with Step Functions, alongside other application-level capabilities.
  • SWF API actions are task-oriented. SQS API actions are message-oriented.

  • SWF keeps track of all tasks and events in an application. SQS requires you to implement your own application-level tracking, especially if your application uses multiple queues.

  • The SWF Console and visibility APIs provide an application-centric view that lets you search for executions, drill down into an execution’s details, and administer executions. SQS requires implementing such additional functionality.

  • SWF offers several features that facilitate application development, such as passing data between tasks, signaling, and flexibility in distributing tasks. SQS requires you to implement some application-level functionality.

  • In addition to a core SDK that calls service APIs, SWF provides the Flow Framework with which you can write distributed applications using programming constructs that structure asynchronous interactions.

Amazon S3 vs Glacier

 

  • Amazon S3 is a durable, secure, simple, and fast storage service, while Amazon S3 Glacier is used for archiving solutions.
  • Use S3 if you need low latency or frequent access to your data. Use S3 Glacier for low storage cost, and you do not require millisecond access to your data.
  • You have three retrieval options when it comes to Glacier, each varying in the cost and speed it retrieves an object for you. You retrieve data in milliseconds from S3.
  • Both S3 and Glacier are designed for durability of 99.999999999% of objects across multiple Availability Zones.
  • S3 and Glacier are designed for availability of 99.99%.
  • S3 can be used to host static web content, while Glacier cannot.
  • In S3, users create buckets. In Glacier, users create archives and vaults.
  • You can store a virtually unlimited amount of data in both S3 and Glacier.
  • A single Glacier archive can contain 40TB of data.
  • S3 supports Versioning.
  • You can run analytics and querying on S3.
  • You can configure a lifecycle policy for your S3 objects to automatically transfer them to Glacier. You can also upload objects directly to either S3 or Glacier.
  • S3 Standard-IA and One Zone-IA have a minimum capacity charge per object of 128KB. Glacier’s minimum is 40KB.
  • Objects stored in S3 have a minimum storage duration of 30 days (except for S3 Standard). Objects that are archived to Glacier have a minimum 90 days of storage. Objects that are deleted, overwritten, or transitioned to a different storage class before the minimum duration will incur the normal usage charge plus a pro-rated request charge for the remainder of the minimum storage duration.
  • Glacier has a per GB retrieval fee.
  • You can transition objects from some S3 storage classes to another. Glacier objects can only be transitioned to the Glacier Deep Archive storage class.
  • S3 (standard, intelligent-tiering, standard-IA, and one zone-IA) and Glacier are backed by an SLA.

Amazon S3 vs EBS vs EFS

 


S3

EBS

EFS

Type of storage

Object storage. You can store virtually any kind of data in any format.

Persistent block level storage for EC2 instances.

POSIX-compliant file storage for EC2 instances.

Features

Accessible to anyone or any service with the right permissions

Deliver performance for workloads that require the lowest-latency access to data from a single EC2 instance

Has a file system interface, file system access semantics (such as strong consistency and file locking), and concurrently-accessible storage for multiple EC2 instances

Max Storage Style 

Virtually unlimited 

16 TiB for one volume 

Unlimited system size

Max File Size

Individual Amazon S3 objects can range in size to a maximum of 5 terabytes.

Equivalent to the maximum size of your volumes

47.9 TiB for a single file

Performance (Latency)

Low, for mixed request types, and integration with CloudFront

Lowest, consistent; SSD-backed storages include the highest performance Provisioned OPS SSD and General Purpose SSD that balance price and performance. 

Low, consistent; use Max I/O mode for higher performance

Performance (Throughput)

Multiple GBs per second; supports multi-part upload

Up to 2 GB per second. HDD-backed volumes include throughput intensive workloads and Cold HDD for less frequently accessed data.

10+ GB per second. Bursting Throughput mode scales with the scales with the size of the file system. Provisioned throughput mode offers higher dedicated throughput than bustring throughput

Durability 

Stored redundantly across multiple AZs; has 99.999999999% durability 

Stored redundantly in a single AZ

Stored redundantly across multiple AZs

Availability 

S3 Standard – 99.99% availability S3 Standard-IA – 99.9% availability

S3 One Zone-IA – 99.5% availability.

S3 Intelligent Tiering – 99.9%

Has 99.999% availability

99.9% SLA. Runs in multi – AZ

Scalability

Highly scalable 

Manually increase/decrease your memory size. Attach and detach additional volumes to and from your EC2 instance to scale.

EFS file systems are elastic, and automatically grow and shrink as you add and remove files.

Data Accessing 

One to millions of connections over the wed; S3 provides a REST web services interface

Single EC2 instance in a single AZ 

Amazon EBS Multi-Attach a single Provisioned IOPS SSD (io1 or io2) volume to up to 16 Nitro-based instances that are in the same Availability Zone.

One to thousands of EC2 instances or on-premises servers, from multiple AZs, regions, VPCs, and accounts concurrently 

Access Control

Uses bucket policies and IAM user policies. Has Block Public Access settings to help manage public access to resources.

IAM Policies, Roles, and Security Groups 

Only resources that can access endpoints in your VPC, called a mount target, can access your file system; POSIX-compliant user and group-level permissions.

Encryption Methods 

Supports SSL endpoints using the HTTPS protocol, Client-Side and Server-Side Encryption (SSE-S3, SSE-C, SSE – KMS)

Encrypts both data-at-rest and data-in-transit through EBS encryption that uses AWS KMS CMKs.

Encrypt data at rest and in transit. Data at rest encryption uses AWS KMS. Data in transit uses TLS.

Backup and Restoration 

Use versioning or cross-region replication

All EBS volume types offer durable snapshot capabilities.

EFS to EFS replication through third party tools or AWS DataSynch 

Pricing

Billing prices are based on the location of your bucket. Lower costs equals lower prices. You get cheaper prices the more you use S3 storage.

You pay Gb-month of provisioned storage, provisioned IOPS-month, GB-month of snapshot data stored in S3

You pay more the amount of file system storage used per month. When using the Provisioned Throughput mode you pay for the throughput you provision per month.

Use Cases 

Web serving and content management, media and entertainment, backups, big data analytics, data lake

Boot volumes, transactional and NoSQL databases, data warehousing & ETL

Web serving and content management,enterprise applications, media and entertainment, home directories, database backups, developer tools, container storage, big data analytics

Service endpoint 

Can be accessed within and outside a VPC ( via S3 bucket URL)

Accessed within one’s VPC 

Accessed within one’s VPC

Amazon S3 vs Glacier

 

  • Amazon S3 is a durable, secure, simple, and fast storage service, while Amazon S3 Glacier is used for archiving solutions.
  • Use S3 if you need low latency or frequent access to your data. Use S3 Glacier for low storage cost, and you do not require millisecond access to your data.
  • You have three retrieval options when it comes to Glacier, each varying in the cost and speed it retrieves an object for you. You retrieve data in milliseconds from S3.
  • Both S3 and Glacier are designed for durability of 99.999999999% of objects across multiple Availability Zones.
  • S3 and Glacier are designed for availability of 99.99%.
  • S3 can be used to host static web content, while Glacier cannot.
  • In S3, users create buckets. In Glacier, users create archives and vaults.
  • You can store a virtually unlimited amount of data in both S3 and Glacier.
  • A single Glacier archive can contain 40TB of data.
  • S3 supports Versioning.
  • You can run analytics and querying on S3.
  • You can configure a lifecycle policy for your S3 objects to automatically transfer them to Glacier. You can also upload objects directly to either S3 or Glacier.
  • S3 Standard-IA and One Zone-IA have a minimum capacity charge per object of 128KB. Glacier’s minimum is 40KB.
  • Objects stored in S3 have a minimum storage duration of 30 days (except for S3 Standard). Objects that are archived to Glacier have a minimum 90 days of storage. Objects that are deleted, overwritten, or transitioned to a different storage class before the minimum duration will incur the normal usage charge plus a pro-rated request charge for the remainder of the minimum storage duration.
  • Glacier has a per GB retrieval fee.
  • You can transition objects from some S3 storage classes to another. Glacier objects can only be transitioned to the Glacier Deep Archive storage class.
  • S3 (standard, intelligent-tiering, standard-IA, and one zone-IA) and Glacier are backed by an SLA.