AWS DataSync is an online data transfer service that simplifies, automates, and accelerates the process of copying large amounts of data to and from AWS storage services over the Internet or AWS Direct Connect. DataSync can be particularly useful when migrating large amounts of data between clouds or from on-premises systems to the cloud.
With DataSync, you can copy data between Network File System (NFS) or Server Message block (SMB) file servers, Amazon Simple Storage Service (S3), Amazon Elastic File System (EFS) and Amazon FSx for Windows File Servers.
I’ll describe the steps you need to take next in two scenarios: the first will describe what to do if you’re transferring data between AWS accounts, and the second will describe what to do if you’re transferring data from on-premises to AWS S3. For an in-depth look at using AWS Data Sync, check out this user guide.
No matter which scenario applies to you, start with these initial steps:
The following diagram provides a high-level view of the DataSync architecture for transferring data between AWS services within the same account. This architecture applies to both in-region and cross-region transfers.
A DataSync agent is not required when executing S3-to-S3 transfers across accounts and regions. This is internally handled by AWS’ DataSync architecture. For other storage resources — such as EFS and FSx — you can execute DataSync without an agent when transferring within the same account, but you need to use an agent when transferring data across accounts.
When transferring data between accounts, make sure that your destination account has IAM credentials for accessing your source S3 location data. It is recommended that you run the DataSync service in the destination AWS account to avoid any issues.
Your source account will contain the source S3 bucket and your destination account will contain the destination S3 bucket and run the DataSync service.
{
"Version": "2012–10–17",
"Id": "Policy1616194240988",
"Statement": [
{
"Sid": "Stmt1616194236908",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::1111111111111:user/username",
"arn:aws:iam::1111111111111:role/CrossAccountAccess"
]
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::trellis-datasync-src1",
"arn:aws:s3:::trellis-datasync-src1/*"
]
}
]
}
❯ aws sts get-caller-identity
{
"UserId": "ABCDEFGHIJKLMNOP5555",
"Account": "555555555555",
"Arn": "arn:aws:iam::55555555555:user/username"
}
{
"Version": "2012–10–17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "datasync.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
❯ aws datasync create-location-s3 — s3-bucket-arn arn:aws:s3:::trellis-datasync-src1 — s3-config ‘{**BucketAccessRoleArn**:**arn:aws:iam::5555555555:role/CrossAccountAccess**}’ — region eu-west-1
If the command works, below output will be displayed:
{
"LocationArn": "arn:aws:datasync:eu-west-1:079349112641:location/loc-054160bbd934e32c9"
}
Now, the source S3 location should show up in the destination view, and we can set up both source and destination locations for the DataSync service.
When you’ve completed configuration, you’ll see details about your task in the following dashboard:
The diagram below depicts the process of transferring data between on-premises servers to AWS S3.
The same dynamics apply when the source (NFS, SMB) is deployed in AWS, as the diagram below shows.
To execute this transfer, follow these steps:
You’ll need CLIs to enable NFS service and export the NFS path:
sudo apt-get update && sudo apt-get install nfs-kernel-server
sudo vi /etc/exports
❯ Add /home/ubuntu/data
*(rw,fsid=1,no_subtree_check,sync,insecure)
sudo service nfs-kernel-server reload
systemctl status nfs-server.service
Whether you’re in scenario 1 or 2, take the steps below if you’re deploying the EC2 instance as a DataSync agent in an AWS region close to your source location. For the steps you need to take if you’re pursuing other deployment options, see this document.
First get AMI for the DataSync agent required for your region using the following command:
❯ aws ssm get-parameter — name /aws/service/datasync/ami — region eu-west-1{
"Parameter": {
"Name": "/aws/service/datasync/ami",
"Type": "String",
"Value": "ami-024d0f6d9d751d0a1",
"Version": 20,
"LastModifiedDate": "2021–03–02T07:50:22.419000–08:00",
"ARN": "arn:aws:ssm:eu-west-1::parameter/aws/service/datasync/ami",
"DataType": "text"
}
}
Using the AMI value from the output above, launch the EC2 instance (m5.2xlarge or m5.4xlarge) using the following URL, substituting region and ami-id:
If using a security group, assign the above security-group or ensure desired inbound/outbound rules are applicable for this instance.
For more on executing this process consult the AWS user guide.
For more on how to transfer file data across AWS regions and accounts using VPC peering, see AWS’ documentation titled Creating and accepting a VPC peering connection and Transferring file data across AWS Regions and accounts using AWS DataSync.