Amazon Aurora DSQL is a serverless distributed PostgreSQL-compatible database with nearly limitless scale, highest availability, and 0 infrastructure administration. Aurora DSQL alleviates the necessity for database sharding and occasion upgrades whereas supporting each single-Area and multi-Area deployments. Aurora DSQL supplies devoted regional endpoints for every Area in your multi-Area cluster enabling functions to attach on to their optimum Area for the bottom doable latency. Its structure supplies robust information consistency for reads and writes with 99.99% availability in single-Area deployments and 99.999% availability in multi-Area deployments by means of its active-active distributed design.
Purposes utilizing Aurora DSQL multi-Area clusters ought to implement a DNS-based routing answer (akin to Amazon Route 53) to mechanically redirect site visitors between AWS Areas. This ensures continuity of operations if both an Aurora DSQL cluster or a whole AWS Area turns into unreachable.
Finest practices advocate implementing application-level routing logic to handle regional failovers holistically. Nevertheless, when your utility depends on a number of information shops together with Aurora DSQL, you want a particular technique for dealing with conditions the place Aurora DSQL regional endpoints turn out to be unreachable. On this publish, we present you automated answer for redirecting database site visitors to alternate regional endpoints with out requiring handbook configuration modifications, notably in blended information retailer environments.
Endpoint administration for multi-Area Aurora DSQL Clusters:
Let’s have a look at multi-Area utility structure, utilizing Amazon Aurora DSQL because the persistence layer.
Aurora DSQL multi-Area clusters use synchronous cross-Area replication to take care of robust consistency between the Areas (and between the DSQL witness Area, which isn’t proven within the diagram). DSQL can settle for reads and writes to both regional endpoint and, because of Aurora DSQL’s robust consistency, a reader in Area A can instantly see a dedicated write in Area B and vice-versa. This property of DSQL makes constructing multi-Area active-active functions a lot simpler.
Since DSQL is maintaining the info constant throughout areas. The appliance stack doesn’t even must know that it’s working in a multi-Area active-active configuration. It may be utterly unaware of the opposite Area. The appliance doesn’t must carry out any cross-Area coordination or messaging. DSQL handles that.
With Aurora DSQL, you don’t want to fret about database failover or switchover operations because the service mechanically handles these operations. Nevertheless, in particular eventualities the place an utility makes use of a number of information shops for various API calls or for functions connecting from an exterior datacenter to multi-Area DSQL clusters, straight switching between DSQL endpoints is extra environment friendly than redirecting complete utility server endpoints. This strategy reduces operational complexity and minimizes the trouble required throughout service disruptions by concentrating on solely the affected database connections moderately than shifting the whole utility stack. Any occasion that causes a disruption to a DSQL regional cluster is more likely to additionally impression availability of your utility within the affected Area. We current an automatic answer that connects functions to a reachable regional endpoint within the occasion of a regional endpoint failure in a multi-Area DSQL cluster setup.
The answer mentioned on this publish is out there as a pattern code on GitHub.
Resolution overview
On this answer, we reveal find out how to implement computerized redirection of connections from an utility between Aurora DSQL endpoints utilizing a customized Python client-side library. When deployed, it displays Aurora DSQL endpoints by means of Amazon Route 53 APIs. The library first identifies wholesome endpoints by means of these well being checks, then measures the latency between the shopper and every wholesome endpoint. It mechanically routes the shopper connection to the wholesome endpoint with the bottom latency and makes positive that, within the uncommon occasion of a regional endpoint turning into unreachable, the shopper connections are routed to the lowest-latency wholesome Aurora DSQL endpoint. And because of Aurora DSQL’s robust consistency, purchasers can instantly see the consequences of all transactions efficiently dedicated on any endpoint.
Let’s look into the important thing options of this answer:
Automated endpoint choice – To supply optimum connectivity, this answer maintains a dynamic checklist of accessible database cluster endpoints and often performs latency checks to out there endpoints, making a ranked checklist primarily based on response occasions. This rating is then mixed with predefined precedence settings within the configuration file. Primarily based on the latency to every endpoint, it then chooses the most effective endpoint for every connection.
Route 53 well being checks – This answer integrates with Route 53 well being checks, utilizing the AWS world infrastructure for complete well being monitoring. This strategy supplies a strong and versatile system for sustaining endpoint well being and informing routing selections.
Automated connection failover help – To take care of excessive availability and reduce utility downtime, the answer constantly displays the well being of every Regional database cluster endpoint. When points are detected with the present endpoint, it mechanically redirects shopper connections to wholesome different endpoints. This makes positive shopper functions keep steady database entry, even when a specific endpoint is unreachable. This answer manages which Area purchasers set up their database connections. The result’s minimal disruption to the consumer expertise, as a result of functions easily transition to out there endpoints with out handbook intervention.
The next diagram illustrates the answer workflow.
The workflow contains the next steps:
The shopper (working in both Area the place the cluster has been deployed.) calls get_connection() to provoke a connection, after which the library evaluates out there DSQL endpoints and establishes the optimum connection primarily based on well being and efficiency metrics.
The library consults Route 53 well being checks for real-time endpoint standing. These well being checks run at 30-second intervals, offering close to up-to-date details about endpoint availability and constantly monitoring for indicators of degradation or failure.
Utilizing well being test information, the library connects to the wholesome endpoint. If the first endpoint fails, the system mechanically redirects to wholesome alternate options.
Stipulations
To deploy this answer, you should full the next conditions:
Be certain Python model 3.10 or larger is put in in your system. Confirm the set up by working the next code in your terminal:
Acquire AWS credentials with applicable DSQL entry permissions. Configure these credentials utilizing the AWS Command Line Interface (AWS CLI) or atmosphere variables.
Confirm that your system has community entry to the DSQL endpoints. This may contain configuring Amazon Digital Personal Cloud (VPC) settings or safety teams.
Verify your AWS credentials have permissions to create and handle Route 53 well being checks.
Set up Python and dependent packages, and configure the AWS CLI
Full the next steps:
Clone the repository:
git clone https://github.com/aws-samples/sample-multi-region-Endpoint-Routing-for-Aurora-DSQL.git
cd sample-multi-region-Endpoint-Routing-for-Aurora-DSQL
Arrange the Python atmosphere and create a brand new digital atmosphere named venv:
python3 -m venv venv
supply venv/bin/activate # On Home windows: venvScriptsactivate
Set up the required dependencies within the file necessities.txt required to run this answer:
pip set up -r necessities.txt
Configure the AWS CLI. This supplies a handy option to arrange your credentials globally.
div class=”hide-language”>
Comply with the prompts in your terminal. The command will mechanically open your default browser and information you thru the authentication course of. After profitable authentication, your AWS CLI session might be legitimate for as much as 12 hours.
Arrange configuration information and Route 53 well being checks
The GitHub repository incorporates a configuration file named dsql_config_with_healthchecks.json. This file has a construction just like the next instance. It’s essential to modify the next fields:
For each Areas, replace the cluster_id discipline utilizing the cluster IDs you recorded within the conditions.
Exchange the hostname discipline along with your Regional DSQL endpoint that was captured earlier.
–config – This parameter specifies the trail to a configuration file. The configuration file is the JSON file dsql_config_with_healthchecks.json, which incorporates details about DSQL endpoints and connection settings.
–setup – This parameter creates Route 53 well being checks and updates the health_check_id for every endpoint within the configuration file dsql_config_with_healthchecks.json.
–take a look at – This parameter is to run connectivity checks.
This script reads your configuration file, creates a well being test in Route 53 for every endpoint, and updates your configuration file with the newly created well being test IDs. The health_check_id is a novel identifier for the Route 53 well being test related to every endpoint.
Check connectivity with Route 53 well being checks and client-side latency routing
To check the fundamental connectivity to your DSQL endpoints, run the next command. This script combines client-side latency measurement for optimum endpoint choice, Route 53 well being checks for dependable well being monitoring, and computerized failover capabilities to offer steady service availability.
This take a look at simulates a failure situation and verifies that your system responds appropriately. Run the take a look at script with the next command:
This script executes a collection of operations to validate the appliance (or shopper connection) failover mechanism:
First, it establishes a connection to the optimum out there endpoint as decided by your configuration priorities.
After it’s related, the script deliberately disables the Route 53 well being test related to this major endpoint, simulating a failure.
The script then waits for the well being test standing to propagate by means of the AWS community, replicating real-world failure situations.
Then the script makes an attempt to create a brand new connection, which ought to now fail over to a secondary endpoint because of the simulated failure of the first.
Throughout this era, it verifies that your system efficiently fails over to a secondary endpoint, confirming steady operation regardless of the first endpoint’s simulated failure.
After confirming profitable failover, the script re-enables the well being test for the first endpoint and validates that connections can as soon as once more be established to the restored major endpoint.
2025-05-21 19:03:52,864 - principal - INFO -
=== STEP 1: Testing connection below regular situations ===
2025-05-21 19:03:52,866 - dsql_hybrid_manager - INFO - Loaded configuration from dsql_config_with_healthchecks.json
2025-05-21 19:03:52,879 - botocore.credentials - INFO - Discovered credentials in atmosphere variables.
2025-05-21 19:03:52,982 - dsql_hybrid_manager - INFO - Initialized DSQL Hybrid Connection Supervisor with 2 endpoints
2025-05-21 19:03:54,055 - dsql_hybrid_manager - INFO - Route 53 well being test a4709bfe-bc41-4afc-9f55-4919ee884b7c: 16/16 wholesome observations
2025-05-21 19:03:54,055 - dsql_hybrid_manager - INFO - Route 53 well being test a4709bfe-bc41-4afc-9f55-4919ee884b7c: Wholesome
2025-05-21 19:03:55,011 - dsql_hybrid_manager - INFO - Route 53 well being test 46eca8a2-6a07-43e6-94fb-21f95fb11d5a: 16/16 wholesome observations
2025-05-21 19:03:55,011 - dsql_hybrid_manager - INFO - Route 53 well being test 46eca8a2-6a07-43e6-94fb-21f95fb11d5a: Wholesome
2025-05-21 19:03:55,011 - dsql_hybrid_manager - INFO - Discovered 2 wholesome endpoints out of two
2025-05-21 19:03:55,057 - dsql_hybrid_manager - INFO - Endpoint latency comparability:
2025-05-21 19:03:55,057 - dsql_hybrid_manager - INFO - 1. xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws - Latency: 0.002422s, Precedence: 1, Area: us-east-2
2025-05-21 19:03:55,057 - dsql_hybrid_manager - INFO - 2. yyyyyyyyyyyyyyy.dsql.us-east-1.on.aws - Latency: 0.012726s, Precedence: 2, Area: us-east-1
2025-05-21 19:03:55,057 - dsql_hybrid_manager - INFO - Chosen greatest endpoint: xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws (latency: 0.002422s, precedence: 1)
2025-05-21 19:03:55,058 - principal - INFO - Finest endpoint chosen: xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws (latency: 0.002422s)
2025-05-21 19:03:55,058 - principal - INFO - Well being test ID: a4709bfe-bc41-4afc-9f55-4919ee884b7c
2025-05-21 19:03:55,058 - dsql_hybrid_manager - INFO - Discovered 2 wholesome endpoints out of two
2025-05-21 19:03:55,100 - dsql_hybrid_manager - INFO - Producing DSQL admin auth token for xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws in us-east-2
2025-05-21 19:03:55,101 - dsql_hybrid_manager - INFO - Generated token preview: jiabuacbso...a4e75aa0f9
2025-05-21 19:03:55,101 - dsql_hybrid_manager - INFO - Connecting to xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws (latency: 0.001538s, area: us-east-2, precedence: 1)
2025-05-21 19:03:55,344 - dsql_hybrid_manager - INFO - Efficiently related to xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws
2025-05-21 19:03:55,344 - principal - INFO - Related to: xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws
2025-05-21 19:03:55,344 - principal - INFO - Operating question iteration 1/1
2025-05-21 19:03:55,451 - principal - INFO - Outcome: ('PostgreSQL 16',)
2025-05-21 19:03:55,451 - principal - INFO - Question execution time: 106.33ms
2025-05-21 19:03:55,451 - principal - INFO - Common question execution time over 1 iterations: 106.33ms
2025-05-21 19:03:55,451 - principal - INFO - Connection closed
2025-05-21 19:03:55,452 - principal - INFO -
=== STEP 2: Simulating failure of the first endpoint's well being test: a4709bfe-bc41-4afc-9f55-4919ee884b7c ===
2025-05-21 19:03:55,627 - principal - INFO - Disabled well being test a4709bfe-bc41-4afc-9f55-4919ee884b7c to simulate failure
2025-05-21 19:03:55,628 - principal - INFO - Ready 60 seconds for well being test standing to propagate...
2025-05-21 19:04:55,674 - principal - INFO -
=== STEP 3: Testing reference to major endpoint well being test failure ===
2025-05-21 19:04:55,674 - dsql_hybrid_manager - INFO - Loaded configuration from dsql_config_with_healthchecks.json
2025-05-21 19:04:55,684 - dsql_hybrid_manager - INFO - Initialized DSQL Hybrid Connection Supervisor with 2 endpoints
2025-05-21 19:04:55,796 - dsql_hybrid_manager - ERROR - Error checking Route 53 well being standing for a4709bfe-bc41-4afc-9f55-4919ee884b7c: An error occurred (InvalidInput) when calling the GetHealthCheckStatus operation: Invalid parameter : The required well being test has a particular standing of at all times wholesome. GetHealthCheckStatus cannot return the standing of certainly one of these particular well being checks.
2025-05-21 19:04:56,797 - dsql_hybrid_manager - INFO - Route 53 well being test 46eca8a2-6a07-43e6-94fb-21f95fb11d5a: 16/16 wholesome observations
2025-05-21 19:04:56,797 - dsql_hybrid_manager - INFO - Route 53 well being test 46eca8a2-6a07-43e6-94fb-21f95fb11d5a: Wholesome
2025-05-21 19:04:56,797 - dsql_hybrid_manager - INFO - Discovered 1 wholesome endpoints out of two
2025-05-21 19:04:56,837 - dsql_hybrid_manager - INFO - Endpoint latency comparability:
2025-05-21 19:04:56,837 - dsql_hybrid_manager - INFO - 1. yyyyyyyyyyyyyyy.dsql.us-east-1.on.aws - Latency: 0.013187s, Precedence: 2, Area: us-east-1
2025-05-21 19:04:56,837 - dsql_hybrid_manager - INFO - Chosen greatest endpoint: yyyyyyyyyyyyyyy.dsql.us-east-1.on.aws (latency: 0.013187s, precedence: 2)
2025-05-21 19:04:56,837 - principal - INFO - Finest endpoint chosen: yyyyyyyyyyyyyyy.dsql.us-east-1.on.aws (latency: 0.013187s)
2025-05-21 19:04:56,837 - principal - INFO - Well being test ID: 46eca8a2-6a07-43e6-94fb-21f95fb11d5a
2025-05-21 19:04:56,878 - dsql_hybrid_manager - ERROR - Error checking Route 53 well being standing for a4709bfe-bc41-4afc-9f55-4919ee884b7c: An error occurred (InvalidInput) when calling the GetHealthCheckStatus operation: Invalid parameter : The required well being test has a particular standing of at all times wholesome. GetHealthCheckStatus cannot return the standing of certainly one of these particular well being checks.
2025-05-21 19:04:56,879 - dsql_hybrid_manager - INFO - Discovered 1 wholesome endpoints out of two
2025-05-21 19:04:56,918 - dsql_hybrid_manager - INFO - Producing DSQL admin auth token for yyyyyyyyyyyyyyy.dsql.us-east-1.on.aws in us-east-1
2025-05-21 19:04:56,919 - dsql_hybrid_manager - INFO - Generated token preview: e4abuacbso...238cb4c5fe
2025-05-21 19:04:56,919 - dsql_hybrid_manager - INFO - Connecting to yyyyyyyyyyyyyyy.dsql.us-east-1.on.aws (latency: 0.011756s, area: us-east-1, precedence: 2)
2025-05-21 19:04:57,234 - dsql_hybrid_manager - INFO - Efficiently related to yyyyyyyyyyyyyyy.dsql.us-east-1.on.aws
2025-05-21 19:04:57,234 - principal - INFO - Related to: yyyyyyyyyyyyyyy.dsql.us-east-1.on.aws
2025-05-21 19:04:57,234 - principal - INFO - Operating question iteration 1/1
2025-05-21 19:04:57,368 - principal - INFO - Outcome: ('PostgreSQL 16',)
2025-05-21 19:04:57,368 - principal - INFO - Question execution time: 133.73ms
2025-05-21 19:04:57,368 - principal - INFO - Common question execution time over 1 iterations: 133.73ms
2025-05-21 19:04:57,368 - principal - INFO - Connection closed
2025-05-21 19:04:57,369 - principal - INFO - Failover profitable! Switched from xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws to yyyyyyyyyyyyyyy.dsql.us-east-1.on.aws
2025-05-21 19:04:57,369 - principal - INFO -
=== STEP 4: Restoring unique well being test configuration ===
2025-05-21 19:04:57,500 - principal - INFO - Re-enabled well being test a4709bfe-bc41-4afc-9f55-4919ee884b7c
2025-05-21 19:04:57,502 - principal - INFO - Ready 60 seconds for well being test standing to propagate...
2025-05-21 19:05:57,553 - principal - INFO -
=== STEP 5: Testing connection after restoring well being test ===
2025-05-21 19:05:57,554 - dsql_hybrid_manager - INFO - Loaded configuration from dsql_config_with_healthchecks.json
2025-05-21 19:05:57,561 - dsql_hybrid_manager - INFO - Initialized DSQL Hybrid Connection Supervisor with 2 endpoints
2025-05-21 19:05:58,571 - dsql_hybrid_manager - INFO - Route 53 well being test a4709bfe-bc41-4afc-9f55-4919ee884b7c: 16/16 wholesome observations
2025-05-21 19:05:58,571 - dsql_hybrid_manager - INFO - Route 53 well being test a4709bfe-bc41-4afc-9f55-4919ee884b7c: Wholesome
2025-05-21 19:05:59,775 - dsql_hybrid_manager - INFO - Route 53 well being test 46eca8a2-6a07-43e6-94fb-21f95fb11d5a: 16/16 wholesome observations
2025-05-21 19:05:59,775 - dsql_hybrid_manager - INFO - Route 53 well being test 46eca8a2-6a07-43e6-94fb-21f95fb11d5a: Wholesome
2025-05-21 19:05:59,775 - dsql_hybrid_manager - INFO - Discovered 2 wholesome endpoints out of two
2025-05-21 19:05:59,815 - dsql_hybrid_manager - INFO - Endpoint latency comparability:
2025-05-21 19:05:59,815 - dsql_hybrid_manager - INFO - 1. xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws - Latency: 0.001920s, Precedence: 1, Area: us-east-2
2025-05-21 19:05:59,815 - dsql_hybrid_manager - INFO - 2. yyyyyyyyyyyyyyy.dsql.us-east-1.on.aws - Latency: 0.011219s, Precedence: 2, Area: us-east-1
2025-05-21 19:05:59,815 - dsql_hybrid_manager - INFO - Chosen greatest endpoint: xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws (latency: 0.001920s, precedence: 1)
2025-05-21 19:05:59,815 - principal - INFO - Finest endpoint chosen: xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws (latency: 0.001920s)
2025-05-21 19:05:59,815 - principal - INFO - Well being test ID: a4709bfe-bc41-4afc-9f55-4919ee884b7c
2025-05-21 19:05:59,815 - dsql_hybrid_manager - INFO - Discovered 2 wholesome endpoints out of two
2025-05-21 19:05:59,862 - dsql_hybrid_manager - INFO - Producing DSQL admin auth token for.dsql.us-east-2.on.aws in us-east-2
2025-05-21 19:05:59,864 - dsql_hybrid_manager - INFO - Generated token preview: jiabuacbso...f42f2e31ea
2025-05-21 19:05:59,865 - dsql_hybrid_manager - INFO - Connecting to xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws (latency: 0.001445s, area: us-east-2, precedence: 1)
2025-05-21 19:06:00,099 - dsql_hybrid_manager - INFO - Efficiently related to xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws
2025-05-21 19:06:00,099 - principal - INFO - Related to: xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws
2025-05-21 19:06:00,099 - principal - INFO - Operating question iteration 1/1
2025-05-21 19:06:00,210 - principal - INFO - Outcome: ('PostgreSQL 16',)
2025-05-21 19:06:00,210 - principal - INFO - Question execution time: 110.73ms
2025-05-21 19:06:00,210 - principal - INFO - Common question execution time over 1 iterations: 110.73ms
2025-05-21 19:06:00,211 - principal - INFO - Connection closed
2025-05-21 19:06:00,211 - principal - INFO -
=== ROUTE 53 FAILOVER TEST SUMMARY ===
2025-05-21 19:06:00,211 - principal - INFO - Major endpoint: xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws
2025-05-21 19:06:00,212 - principal - INFO - Failover endpoint: yyyyyyyyyyyyyyy.dsql.us-east-1.on.aws
2025-05-21 19:06:00,212 - principal - INFO - Restored endpoint: xxxxxxxxxxxxxxx.dsql.us-east-2.on.aws
2025-05-21 19:06:00,212 - principal - INFO - RESULT: Route 53 failover take a look at SUCCESSFUL!
Utilizing the DSQL connection supervisor in your utility
After you have got the hybrid_failover_approach.py file, integrating it into your utility is easy. The connection supervisor is designed as a drop-in alternative for database connections—no background processes or complicated setup required.
The next code is a basic instance of how you should utilize the connection supervisor in your functions:
from hybrid_failover_approach import DSQLHybridConnectionManager
# Initialize as soon as at utility startup
db_manager = DSQLHybridConnectionManager(config_file="dsql_config_with_healthchecks.json")
# Use in every single place you want a connection
conn = db_manager.get_connection("postgres", "admin")
First, configure your DSQL endpoints by modifying the dsql_config_with_healthchecks.json file.
The next code exhibits a real-world instance of the way it appears in a Flask utility:
from flask import Flask, jsonify
from hybrid_failover_approach import DSQLHybridConnectionManager
app = Flask(__name__)
db_manager = DSQLHybridConnectionManager(config_file="dsql_config_with_healthchecks.json")
@app.route('/customers')
def get_users():
# Mechanically connects to the quickest, healthiest endpoint
conn = db_manager.get_connection("postgres", "admin")
strive:
with conn.cursor() as cursor:
cursor.execute("SELECT id, title, electronic mail FROM customers")
return jsonify(cursor.fetchall())
lastly:
conn.shut()
The fantastic thing about this strategy is its simplicity—you get clever routing, computerized failover, and well being monitoring with out managing any background processes or complicated infrastructure. It’s only a smarter approach to connect with your DSQL clusters.
Cleanup
To delete the well being checks, use the AWS CLI with the well being test IDs that had been added to your configuration file throughout setup:
aws route53 delete-health-check --health-check-id
You’ll find the well being test IDs in your dsql_config_with_healthchecks.json file below the health_check_id discipline for every endpoint. Run the delete command for every well being test ID in your configuration.
Well being test configuration
You’ll be able to customise well being test frequency within the DSQL connection supervisor:
health_check_ttl=60, # Cache well being test outcomes for 60 seconds
The health_check_ttl parameter caches well being test outcomes for the required length. Decrease values (< 60s) allow quicker failover however improve API calls to Route 53, whereas larger values cut back API load however might delay subject detection. Begin with 60 seconds and modify as wanted.
Abstract
On this publish, we mentioned a customized answer that gives an efficient option to managing Aurora DSQL connections with computerized cross-Area connection failover help. By deploying this answer, you’ll be able to present dependable database connectivity in your functions whereas sustaining optimum efficiency and availability.
Check out the answer in your personal use case, and share your suggestions within the feedback.