Sunday, November 30, 2025

Simplify information integration utilizing zero-ETL from self-managed databases to Amazon Redshift


On this submit, we show methods to arrange a zero-ETL integration between self-managed databases comparable to MySQL, PostgreSQL, SQL Server, and Oracle to Amazon Redshift. The transactional information from the supply will get replicated in close to actual time on the vacation spot, which processes analytical queries.

Organizations want real-time entry to their information throughout a number of sources to make knowledgeable enterprise choices. Though extract, rework, and cargo (ETL) processes have historically helped combine information, they are often complicated and time-consuming to take care of. ETL is the method of mixing, cleansing, and normalizing information from totally different sources to arrange it for analytics, AI, and machine studying (ML) workloads. Transferring to zero-ETL structure offers your groups extra responsive analytics capabilities whereas eliminating complicated information pipeline upkeep, so insights can be found when choices have to be made.

A self-managed database refers to a database atmosphere the place you might have full management and duty over the database occasion working by yourself infrastructure, whether or not that’s an Amazon Elastic Compute Cloud (Amazon EC2) occasion, an on-premises digital machine (VM), or another cloud supplier’s digital machine.

Resolution overview

To create a zero-ETL integration, you specify a self-managed database because the supply and an Amazon Redshift information warehouse because the goal. The mixing replicates information from the supply database into the goal information warehouse. The next diagram illustrates this structure.

Conditions

Earlier than you start, guarantee you might have the next conditions in place:

  1. On-premises database server or an EC2 occasion with database configured with:
    1. Community connectivity to Amazon Net Companies (AWS) utilizing AWS Direct Join, AWS Website-to-Website VPN, or Amazon Digital Non-public Cloud (Amazon VPC).
    2. VPC safety teams permitting database port connection to zero-ETL integration.
    3. Applicable subnets and community configuration for zero-ETL replication visitors.
  2. Database configured for on-going replication. Consult with AWS Database Migration Service (AWS DMS) documentation for extra data: Oracle, Microsoft SQL Server, MySQL, or PostgreSQL.
  3. Create a brand new Redshift cluster or modify an present cluster to make use of a customized parameter group with the enable_case_sensitive_identifier parameter set to on, which have to be enabled to create zero-ETL integrations within the information warehouse.
  4. Be sure your AWS consumer or position has the mandatory AWS Id and Entry Administration (IAM) permissions. Full the next steps to configure the permissions:
    1. On the IAM console, select Insurance policies within the navigation pane.
    2. Select Create coverage.
    3. Create a brand new coverage known as zeroetl-selfmanaged-integrations utilizing the next JSON:
      {
         "Model":"2012-10-17",
         "Assertion":[
            {
               "Sid":"ZetlGlueIntegrationAccess",
               "Effect":"Allow",
               "Action":[
                  "glue:CreateIntegration",
                  "glue:ModifyIntegration",
                  "glue:DeleteIntegration",
                  "glue:DescribeIntegrations",
                  "glue:DescribeInboundIntegrations"
               ],
               "Useful resource":"*"
            },
            {
               "Sid":"ZetlRedshiftFullAccess",
               "Impact":"Permit",
               "Motion":[
                  "redshift:GetResourcePolicy",
                  "redshift:PutResourcePolicy",
                  "redshift:DescribeClusterParameters",
                  "redshift:CreateInboundIntegration",
                  "redshift:DescribeClusters"
               ],
               "Useful resource":"*"
            },
            {
               "Sid":"ListNetworkSettings",
               "Impact":"Permit",
               "Motion":[
                  "ec2:DescribeVpcs",
                  "ec2:DescribeSubnets",
                  "ec2:DescribeSecurityGroups"
               ],
               "Useful resource":"*"
            },
            {
               "Sid":"ZetlRedshiftDataAPI",
               "Impact":"Permit",
               "Motion":[
                  "redshift-data:ExecuteStatement",
                  "redshift-data:CancelStatement",
                  "redshift-data:ListStatements",
                  "redshift-data:GetStatementResult",
                  "redshift-data:DescribeStatement",
                  "redshift-data:ListDatabases",
                  "redshift-data:ListSchemas",
                  "redshift-data:ListTables",
                  "redshift-data:DescribeTable"
               ],
               "Useful resource":"*"
            },
            {
               "Sid":"DMSIntegrationAccess",
               "Impact":"Permit",
               "Motion":[
                  "dms:CreateOutboundIntegration",
                  "dms:ModifyOutboundIntegration",
                  "dms:CreateEndpoint",
                  "dms:DescribeEndpoints",
                  "dms:ModifyEndpoint",
                  "dms:DeleteEndpoint",
                  "dms:TestConnection"
               ],
               "Useful resource":"*"
            },
            {
               "Sid":"ZetlKMSAccess",
               "Impact":"Permit",
               "Motion":[
                  "kms:CreateKey",
                  "kms:DescribeKey",
                  "kms:Encrypt",
                  "kms:Decrypt",
                  "kms:GenerateDataKey",
                  "kms:ListKeys",
                  "kms:CreateAlias",
                  "kms:ListAliases",
                  "kms:CreateGrant"
               ],
               "Useful resource":"*"
            },
            {
               "Sid":"ZetlSecretsManagerAccess",
               "Impact":"Permit",
               "Motion":[
                  "secretsmanager:GetSecretValue",
                  "secretsmanager:PutSecretValue",
                  "secretsmanager:CreateSecret",
                  "secretsmanager:UpdateSecret",
                  "secretsmanager:DeleteSecret",
                  "secretsmanager:DescribeSecret",
                  "secretsmanager:ListSecrets",
                  "secretsmanager:GetResourcePolicy",
                  "secretsmanager:PutResourcePolicy",
                  "secretsmanager:ValidateResourcePolicy"
               ],
               "Useful resource":"*"
            }
         ]
      }

  5. Connect the coverage you created to your IAM consumer or position permissions.

Resolution walkthrough

To create a DMS endpoint, comply with these steps:

  1. Open the AWS DMS console.
  2. Within the navigation pane, underneath Migrate or replicate, select Endpoints.
  3. Select Create endpoint, choose Supply endpoint.
  4. Choose the Supply engine, comparable to Oracle, MySQL, PostgreSQL and Microsoft SQL Server.

  5. Select entry endpoint sort and supply data. Choose Select AWS Secrets and techniques Supervisor to offer a secret ID and IAM position.

  6. Select Create endpoint.
  7. Upon creation of supply DMS endpoint, copy DMS endpoint ARN.

To configure the Redshift cluster vacation spot, comply with these steps:

  1. Navigate to the Redshift cluster and select the Useful resource coverage tab.
  2. Select Edit Licensed principals.
  3. Enter both the ARN of the AWS consumer or position or the AWS account ID (IAM principals) which can be allowed to create integrations on this namespace.
  4. Select Save modifications.
  5. Select Edit Licensed integration sources.
  6. Enter the ARN of the DMS supply endpoint that’s the information supply for the zero-ETL integration.
  7. Select Save modifications.

To create a zero-ETL integration, comply with these steps:

  1. Open AWS Glue console.
  2. Within the navigation pane, underneath Knowledge Integration and ETL, choose Zero-ETL integrations.
  3. Select Create zero-ETL integration.

  4. Choose the supply sort, as proven within the following screenshot. It may be any of the self-managed databases comparable to Microsoft SQL Server, MySQL, Oracle, or PostgreSQL. Select Subsequent.

  5. On this web page, configure the supply and goal.
    1. To finish supply particulars, Browse DMS endpoints and choose the DMS endpoint created in step 1.

    2. Below Knowledge filtering choices, within the Select filter sort dropdown menu, choose both Embody or Exclude, as proven within the following screenshot. Expressions should use Maxwell filter syntax.

    3. In Community settings, present particulars comparable to Digital personal cloud (VPC), Subnets, and VPC safety teams to make a profitable connection to the supply and run zero-ETL integrations, as proven within the following screenshot.

    4. To finish goal particulars, underneath AWS account choose the choice applicable for the place the Amazon Redshift information warehouse is hosted, both Use the present account or Specify a special account, as proven within the following screenshot.

  6. For Safety and information encryption, you possibly can choose both AWS managed KMS key or a customized AWS KMS key managed by AWS Key Administration Service (AWS KMS).
  7. Within the Integration particulars part, for Title, enter zeroetl-selfmanaged-redshift and select Subsequent.

  8. Assessment the small print and select Create and launch integration.
  9. The newly created zero-ETL integration will present as Lively in a couple of minute, as proven within the following screenshot.

To create a database from the mixing in AWS Glue, comply with the directions within the subsequent part.To create your database, full the next steps:

  1. Open AWS Glue console.
  2. Within the navigation pane, underneath Knowledge integration and ETL, select zero-ETL integrations.
  3. Select Create database from integration
  4. Enter Vacation spot database title that will probably be created in Amazon Redshift.
  5. Enter Supply database title from which information will probably be replicated, as proven within the following screenshot.

To validate the zero-ETL integration, carry out the next steps:

  1. Open Amazon Redshift.
  2. Within the navigation pane, underneath Integrations, select zero-ETL integrations.
  3. Navigate to the zeroetl-selfmanaged-redshift integration. Assessment the small print of zero-ETL integration.
  4. Select Hook up with database and supply Database title and Database consumer, as proven within the following screenshot.

  5. To view the integration-related metrics printed to Amazon CloudWatch, select the Integration metrics, as proven within the following screenshot.

  6. Change to Desk statistics to view all tables which can be at present energetic or having errors.

  7. Question the next system views and tables in Amazon Redshift to get details about the zero-ETL integrations with Amazon Redshift:
    1. SVV_INTEGRATION supplies configuration particulars of zero-ETL integrations.
    2. SYS_INTEGRATION_ACTIVITY supplies accomplished zero-ETL integrations.
    3. SVV_INTEGRATION_TABLE_MAPPING supplies mapping metadata values from supply to focus on.
    4. SVV_INTEGRATION_TABLE_STATE supplies details about integration state.
    5. SYS_INTEGRATION_TABLE_ACTIVITY supplies details about insert, delete, and replace exercise of integrations.
    6. SYS_INTEGRATION_TABLE_STATE_CHANGE supplies details about desk state change log for integrations.
  8. To question the previous tables or question the information from the replicated database, select Question information on the Zero-ETL integrations.

Clear up

While you delete zero-ETL integration, your transactional information isn’t deleted from self-managed database or Amazon Redshift, however supply database doesn’t ship any new information to Amazon Redshift.

To keep away from incurring future expenses, take away the sources used on this submit out of your AWS account by taking the next steps:

  1. Delete the zero-ETL integration:
    1. On the AWS Glue console, select Zero-ETL integrations within the navigation panel.
    2. Choose the zero-ETL integration that you simply wish to delete and on the Actions menu, select Delete.
    3. To substantiate the deletion, enter verify and select Delete.
  2. Delete Redshift cluster:
    1. On the Amazon Redshift console, select the Clusters within the navigation panel.
    2. Choose the Redshift cluster that you simply wish to delete and on the Actions menu, select Delete.
    3. If a remaining snapshot is required, choose Create remaining snapshot and supply Closing snapshot identifier.
    4. To substantiate the deletion, enter verify and select Delete.
  3. Delete supply DMS endpoint:
    1. On the AWS DMS console, select the Endpoints within the navigation panel.
    2. Choose the DMS endpoint that you simply wish to delete and on the Actions menu, select Delete.
  4. Delete the AWS Secrets and techniques Supervisor secret

Conclusion

On this submit, we confirmed how one can run zero-ETL integration from Oracle, PostgreSQL, MySQL, or SQL Server databases to Amazon Redshift. This minimizes the necessity to preserve complicated information pipelines and permits close to real-time analytics on transactional and operational information. With zero-ETL integrations, you possibly can focus extra on deriving worth out of your information and fewer on managing information motion.

As subsequent steps, assessment the Amazon Redshift zero-ETL integration documentation to grasp implementation particulars. Take into account increasing your analytics capabilities by exploring Amazon Redshift ML for machine studying (ML) integration or implementing federated queries for cross-database analytics. For extra insights, find out how different prospects are implementing zero-ETL with Amazon Aurora (MySQL-Suitable and PostgreSQL-Suitable) within the zero-ETL integration with Amazon Aurora weblog submit.


In regards to the authors

Alex Anto

Alex Anto

Alex is as a Senior Knowledge Migration Specialist Options Architect on the AWS DMS staff. With deep experience in AWS DMS, he helps migrations throughout all sources and goal database combos that the service permits. Past buyer engagement, Alex contributes to bettering AWS information migration merchandise and has efficiently guided tons of of shoppers by way of their cloud transformation journeys.

Shashank Kalki

Shashank Kalki

Shashank is a Knowledge Migration Specialist Options Architect on the AWS DMS staff. He works on challenges associated to information migration and works carefully with prospects to assist them notice the true potential of AWS DMS.

Johnson Mushonga

Johnson Mushonga

Johnson is a Knowledge Migration Specialist Options Architect on the AWS DMS staff. By way of information discovery periods, architectural evaluations, and deep dives into buyer configurations, Johnson advises prospects to efficiently migrate their database workloads to the AWS Cloud.

Changsheng Su

Changsheng Su

Changsheng Su is a Software program Engineer on the AWS DMS staff, the place he contributes to characteristic improvement for DMS streaming endpoints. He’s additionally a predominant contributor for zero-ETL streaming engine and infrastructure.

Yupeng Su

Yupeng Su

Yupeng Su is a Senior Software program Engineer at Amazon Net Companies (AWS), engaged on AWS DMS. He makes a speciality of constructing and working scalable, safe, distributed information migration and ingestion platforms. His areas of curiosity embrace open supply information warehouse and streaming frameworks, plus AI applied sciences. Exterior of his work, he likes to learn tech blogs, papers, and buyer evaluations of AWS companies.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles