Sunday 20 March 2022

AWS DataSync

 

  • An online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect. 
  • DataSync can copy data between:
    • Network File System (NFS) or Server Message Block (SMB) file servers, 
    • Amazon Simple Storage Service (Amazon S3) buckets, 
    • Amazon Elastic File System (Amazon EFS) file systems, 
    • Amazon FSx for Windows File Server file systems

How It Works

AWS DataSync

    1. Deploy an agent – Deploy a DataSync agent and associate it to your AWS account via the Management Console or API. The agent will be used to access your NFS server or SMB file share to read data from it or write data to it.
    2. Create a data transfer task – Create a task by specifying the location of your data source and destination, and any options you want to use to configure the transfer, such as the desired task schedule.
    3. Start the transfer – Start the task and monitor data movement in the console or with Amazon CloudWatch.

Concepts

    • Agent – A virtual machine used to read data from or write data to an on-premises location.
    • Location – Any source or destination location used in the data transfer.
    • Task – A task includes two locations (source and destination), and also the configuration of how to transfer the data from one location to the other. Configuration settings can include options such as how to treat metadata, deleted files, and copy permission.
    • Task execution – An individual run of a task, which includes options such as start time, end time, bytes written, and status. A task execution has five transition phases and two terminal statuses, as shown in the following diagram. If the VerifyMode option is not enabled, a terminal status occurs after the TRANSFERRING phase. Otherwise, it occurs after the VERIFYING phase.

AWS DataSync

Features

    • The service employs an AWS-designed transfer protocol—decoupled from storage protocol—to speed data movement. The protocol performs optimizations on how, when, and what data is sent over the network. 
    • A single DataSync agent is capable of saturating a 10 Gbps network link.
    • DataSync auto-scales cloud resources to support higher-volume transfers, and makes it easy to add agents on-premises.
    • All of your data is encrypted in transit with TLS. DataSync supports using default encryption for S3 buckets using Amazon S3-Managed Encryption Keys (SSE-S3), and Amazon EFS file system encryption of data at rest.
    • DataSync supports storing data directly into S3 Standard, S3 Intelligent-Tiering, S3 Standard-Infrequent Access (S3 Standard-IA), S3 One Zone-Infrequent Access (S3 One Zone-IA), Amazon S3 Glacier, and S3 Glacier Deep Archive.
    • You can use AWS DataSync to copy files into EFS and configure EFS Lifecycle Management to migrate files that have not been accessed for a set period of time to the Infrequent Access (IA) storage class.
    • DataSync ensures that your data arrives intact by performing integrity checks both in transit and at rest. 
    • You can specify an exclude filter, an include filter, or both, to limit which files, folders, or objects get transferred each time a task runs.
    • Task scheduling enables you to configure periodically executing a task, to detect and copy changes from your source storage system to the destination.
    • DataSync supports VPC endpoints (powered by AWS PrivateLink) in order to move files directly into your Amazon VPC.

Use Cases

    • Data migration to Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server.
    • Data processing for hybrid workloads. If you have on-premises systems generating or using data that needs to move into or out of AWS for processing, you can use DataSync to accelerate and schedule the transfers.
    • If you have large amounts of cold data stored in expensive on-premises storage systems, you can move this data directly to durable and secure long-term storage such as Amazon S3 Glacier or Amazon S3 Glacier Deep Archive.
    • If you have large Network Attached Storage (NAS) systems with important files that need to be protected, you can replicate them into S3 using DataSync.
  • DataSync Agent
    • Agents need to be activated first using an activation key entered in the AWS console, before you can start using them. You must activate your agent in the same region where your S3 or EFS source/destination resides.
    • You run DataSync on-premises as a virtual machine (VM).
    • DataSync provides an Amazon Machine Image (AMI) that contains the DataSync VM image when running in an EC2 instance.
    • The agent VM requires access to some endpoints to communicate with AWS. You must configure your firewall settings to allow these connections.
    • You can have more than one DataSync Agent running.
  • AWS DataSync vs AWS CLI tools
    • AWS DataSync fully automates and accelerates moving large active datasets to AWS, up to 10 times faster than command line tools.
    • DataSync uses a purpose-built network protocol and scale-out architecture to transfer data.
    • DataSync fully automates the data transfer. It comes with retry and network resiliency mechanisms, network optimizations, built-in task scheduling, and CloudWatch monitoring that provides granular visibility into the transfer process. 
    • DataSync performs data integrity verification both during the transfer and at the end of the transfer.
    • DataSync provides end to end security, and integrates directly with AWS storage services.
  • AWS DataSync vs Snowball/Snowball Edge
    • AWS DataSync is ideal for online data transfers. AWS Snowball/ Snowball Edge is suitable for offline data transfers, for customers who are bandwidth constrained, or transferring data from remote, disconnected, or austere environments. 
  • AWS DataSync vs AWS Storage Gateway File Gateway
    • Use AWS DataSync to migrate existing data to Amazon S3, and then use the File Gateway to retain access to the migrated data and for ongoing updates from your on-premises file-based applications.
  • AWS DataSync vs Amazon S3 Transfer Acceleration
    • If your applications are already integrated with the Amazon S3 API, and you want higher throughput for transferring large files to S3, you can use S3 Transfer Acceleration. If not, you may use AWS DataSync.
  • AWS DataSync vs AWS Transfer for SFTP
    • If you currently use SFTP to exchange data with third parties, you may use AWS Transfer for SFTP to transfer directly these data.
    • If you want an accelerated and automated data transfer between NFS servers, SMB file shares, Amazon S3, Amazon EFS, and Amazon FSx for Windows File Server, you can use AWS DataSync.

Pricing

    • You pay for the amount of data that you copy. Your costs are based on a flat per-gigabyte fee for the use of network acceleration technology, managed cloud infrastructure, data validation, and automation capabilities in DataSync. 
    • You are charged standard request, storage, and data transfer rates to read to and write from AWS services, such as Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, and AWS Key Management Service (KMS).
    • When copying data from AWS to an on-premises storage system, you pay for AWS Data Transfer at your standard rate. You are also charged standard rates for Amazon CloudWatch Logs, Amazon CloudWatch Events, and Amazon CloudWatch Metrics.
    • You will be billed by AWS PrivateLink for interface VPC endpoints that you create to manage and control the traffic between your agent(s) and the DataSync service over AWS PrivateLink.

Limits

Resource 

Quota

Maximum number of tasks you can create in account per AWS Region

100

Maximum number of files per task

50 million

For tasks that transfer more than 20 million files, make sure that you allocate a minimum of 64 GB of RAM to the VM

Maximum throughput per task 

10 Gbps

No comments:

Post a Comment