In a data driven world, more and more companies collect data on various aspects of their businesses. Everything from employee data to inventory data becomes crucial business intelligence that drives decision making. As a result, databases create a gravitational pull of dependencies from upstream and downstream sources. Migrating such a gravitational resource to AWS Relational Database Service (RDS) can be a daunting task that requires some careful planning and strategy.
While working with our clients, many want to move their MySQL or MariaDB database to AWS RDS because of various benefits this AWS service offers (e.g., scalability and high availability). However, there are several factors that need to be considered when developing a database migration strategy.
Things to consider
Current Database Usage
Before attempting to migrate MySQL to RDS, it is important to understand the context of how the database is utilized and what other resources depend on this database. One particular metric that is important to monitor is the amount of data transferred over the network daily. For example, if a database that needs to be migrated to RDS sends over 1TB of data on a daily basis to an application hosted outside of AWS, you may incur significant AWS data egress charges if the application is not migrated into AWS as well.
Along with that, understanding the current database workload requirements (such as CPU and memory) will be helpful in provisioning appropriate RDS instance type.
Required Database Availability
In addition to performance metrics, it is important to understand the impact that database availability has on your business. Sometimes taking the database offline can speed up the database migration process, but this is not a feasible solution in most business cases. When working with mission-critical databases, one strategy is to create a read replica in RDS, update the app to reference to the new RDS replica, then eventually fade out the old database and make the RDS the primary database. This process takes longer but reduces the amount of downtime that end users experience.
How sensitive is the data? Taking the time to ensure that SSL is enabled on your RDS instance and also on source database will reduce the possibility of a data compromise during transfer. RDS does support SSL connections to its databases, but it does not currently have SSL support for external replication. Therefore, to securely migrate data over to an RDS database, I recommend AWS Database Migration Service (DMS) as it provides the capability to use SSL when connecting to the source and target databases. If replication is needed, DMS also provides functionality to do continuous replication of the data after the initial migration. Another alternative is to create replication to an EC2 instance and then have RDS replicate from that instance. Essentially this alternative provides the same functionality as DMS, but with more user control because the user has access to the OS.
Method 1: DMS
DMS is a database migration service that AWS provides. This allows for a cost-effective way to migrate data from an on-premises database to RDS. The ability for DMS to upload data in parallel makes it a great solution for moving terabyte-sized databases. More information about DMS can be found here.
However, one thing to know about DMS is that it doesn’t copy objects that are not necessary to make the migration successful. This means that stored procedures, triggers, and functions will not be migrated. To overcome this, there are two possible solutions. First is to manually import the entire schema before transferring data. Second, is to manually add the missing database objects after the data migration. It is also a great idea to disable foreign key checks so the target database doesn’t check if the foreign keys exist for every data record imported resulting in faster importing performance. Along with that, modifying innodb_flush_log_at_trx_commit variable in the RDS parameter group to have a value of 2 can also improve import performance since the log buffer is written out to the file at each commit.
To show you DMS capabilities, I downloaded the “world” MySQL database from here and placed it in an EC2 instance to represent a non-RDS MySQL instance. After setting up the database on the EC2 instance the following steps were taken to migrate the data into RDS instance.
Step 1: Create the RDS instance either through the AWS console or using CloudFormation.
Step 2: Create the DMS replication instance and provision it in a subnet that can communicate with your non-RDS instance (source) and the RDS instance (target).
Step 3: Create a Source Endpoint and a Target Endpoint. The DMS instance will use this connection information to connect to the databases.
Step 4: Create the replication task.
Step 5: The migration process will begin. Because the database is small, it took less than 15 seconds to complete.
Step 6: Verify data in the RDS instance.
Method 2: mysqldump file
Another migration strategy is to use the mysqldump utility. The benefit of using mysqldump is that it can retain the schema structure with the data. For databases that will be setup as read replicas, mysqldump stores the GTID in the file as well to allow for easy replication setup. More information about the mysqldump utility can be found here.
Although mysqldump may seem like the perfect utility, it’s not effective for large databases. Importing the data from a large, single file can be time consuming because the import process is single threaded. You can be creative and split a larger file into smaller files, but such work is tedious, delicate, and risky.
To demonstrate mysqldump file migration process, I downloaded the “world” MySQL database from here and placed it in an EC2 instance to represent a non-RDS MySQL instance. After setting up the database on the EC2 instance the following steps were taken to migrate the data into RDS instance.
Step 1: SSH into the source database instance and run the mysqldump utility to save the database to a file.
Step 2: Create the RDS instance (through AWS console or using CloudFormation).
Step 3: Log into the RDS MySQL instance and import the database by referencing the file saved from mysqldump.
Step 4: The imported data will now be in the RDS instance.
There are many ways to import databases into RDS, each has pros and cons. It’s up to you and your organization to decide what would work best for your use case. DMS is great at importing data quickly, but lacks in importing schema objects that are not directly part of the data such as triggers, functions, and stored procedures. Mysqldump files are able to capture data and all schema objects as a whole, but doesn’t work effectively with large data sets.
Please reach out to me at firstname.lastname@example.org if your organization could use some help in database migration.