You can never be too careful in protecting your database. This article will provide a solution for your Oracle instances, which automatically snapshots backup to another account and another region, so your database can survive an AWS region failure or your account security being compromised.
Solution Overview
Amazon Relational Database Service (RDS) allows you to share manual Amazon RDS DB snapshots with another AWS Disaster Recovery (DR) account. This allows the DR account to restore directly from the snapshot or by copying it to the same or different regions for further backup. This is a really cool feature which makes cross-account backups much easier to implement. However, every feature comes with limitations and this is no exception.
The Limitation
“When sharing manual snapshots with other AWS accounts, you cannot share a DB snapshot that uses an option group with permanent or persistent options.”
This means that if you set up options such as Transparent Data Encryption (TDE) and Timezone in the option group of the Oracle instance, then you cannot share the snapshots of this instance with other accounts.
This DR solution does take this limitation into consideration and provides a workaround for it. Here are items that the solution automates for you:
- In the source account:
- Take snapshots automatically based on the user-defined schedule
- Copy the snapshots to a different region (the DR Region)
- Share the snapshots with another account (DR Account) which has more strict access control
- Clean up the old snapshots on both accounts based on the user-defined snapshot_retention_number
- In the source account, source region, we use a scheduled CloudWatch event to trigger a Lambda function to create a manual snapshot automatically. Then we use an RDS event to capture the snapshot copied event and copy the new snapshot to DR region (source account).
- In the source account, target region, we use an RDS event to monitor any newly copied manual snapshot with “dr=true” tag and share them to the DR account, DR region. After it is shared, the Lambda function will trigger an SNS event and invoke a Lambda function in DR account (DR region) to copy the shared snapshot from source account to DR account.
- Snapshots in all regions will be cleaned up automatically based on the parameters set by user.
Deploy the Solution
Parameters | Details | Default Value |
---|---|---|
source_region | Source region of the Primary RDS instance | us-west-2 |
target_region | DR region of the snapshot backups | us-east-2 |
target_account_id | DR account id of the snapshot backups | N/A |
source_profile | The name of the source AWS account profile | N/A |
target_profile | The name of the DR AWS account profile | N/A |
source_account_source_region_stack_name | cfn name of the source account source region | N/A |
source_account_source_region_snapshot_retention_number | snapshot retention number for source account source region | N/A |
source_account_target_region_stack_name | cfn name of the source account dr region | N/A |
source_account_target_region_snapshot_retention_number | snapshot retention number for source account target region | N/A |
dr_account_target_region_stack_name | cfn name of the dr account dr region | N/A |
dr_account_target_region_snapshot_retention_number | snapshot retention number for DR account target region | N/A |
snapshot_frequency | Frequency of taking snapshots | Rate (1 hour) |
- Execute the shell script in cmd line
./rds_dr_deploy.sh
- Add Tag “dr: true” on the RDS instances which need to be backed-up. Note: please check and make sure “Copy tags to snapshots” is set to “Yes” in the RDS instance.
- Turn on the CloudWatch event once the solution is deployed, it is disabled by default.
Done! And now the snapshots of your Oracle instance are regularly generated and cleaned.
RPO and RTO
RTO and RPO are key considerations when building your disaster recovery solution. Here is a quick concept recap:
- Recovery Point Objective (RPO) – The acceptable amount of data loss measured in time.
- Recovery Time Objective (RTO) – The time it takes after a disaster happens to restore a business process to its service level.
For this solution, you can control the RPO by defining parameter snapshot_frequency in the rds_dr_deploy.sh. Please be aware that the cost of the solution will increase as the snapshot frequency increases.
As for the RTO, the current solution doesn’t provide features to auto-detect the RDS instance failure and auto-restore from snapshot, so the whole restore process is manual. The time it takes to recover the RDS instance from snapshots depends on how fast you react to the failure and the size of your database.
If you don’t need to consider the option group limitation
If don’t have the option group limitation in your scenario, you can share the snapshot with the DR account as soon as the snapshot is generated, and then copy the snapshot to the DR region. This will save you one round of copy and make the solution more efficient and cost effective.
Here is an overview of this simplified design that you can tailor to meet your needs:
Conclusion
This post provides a fully automated solution which solves the problem of the Oracle instance snapshots’ inability to be shared with other accounts once the parameter group with permanent or persistent options are set on the instance. You can find the source code for this solution on github.
Hope you found this blog post useful! If you have any questions or need help with your AWS account, contact us at info@1Strategy.com