Instead, you can add a query monitoring rule in your WLM configuration using the query_execution_time metric. If many users have access to your external schemas, it may not be practical to define a statement_timeout for each individual user. The following code example sets a 2-hour timeout for an ETL user:ĪLTER USER etl_user SET statement_timeout TO 7200000 To limit the total runtime of a user’s queries, you can set a statement_timeout for all a user’s queries. User queries could unintentionally try to retrieve a very large number of rows from the external relation and remain running for an extended time, which holds open resources in both Amazon Redshift and PostgreSQL. Use query timeouts to limit total runtimesĬonsider setting a timeout on the users or groups that have access to your external schemas. SECRET_ARN 'arn:aws:secretsmanager:us-west-2:123456789012:secret:apg-redshift-etl-secret-187Asd'ĬREATE EXTERNAL SCHEMA IF NOT EXISTS apg_adhoc Each schema uses a different SECRET_ARN containing credentials for separate users in the PostgreSQL database.ĬREATE EXTERNAL SCHEMA IF NOT EXISTS apg_etlįROM POSTGRES DATABASE 'tpch' SCHEMA 'public' The following code example creates two external schemas for ETL use and ad-hoc reporting use. For instance, you may want to have an external schema for ETL usage, with an associated PostgreSQL user, that has broad access and another schema, and an associated PostgreSQL user for ad-hoc reporting and analysis with access limited to specific resources. This practice allows you to have extra control over the users and groups who can access the external database. Use separate external schemas for each use caseĬonsider creating separate Amazon Redshift external schemas, using separate remote PostgreSQL users, for each specific Amazon Redshift use case. The following best practices apply to your Amazon Redshift cluster when using federated queries to access your Aurora or Amazon RDS for PostgreSQL instances. Best practices to apply in Amazon Redshift The code examples provided in this post derive from the data and queries in the CloudDataWarehouseBenchmark GitHub repo (based on TPC-H and TPC-DS). The best practices are divided into two sections: the first for advice that applies to your Amazon Redshift cluster, and the second for advice that applies to your Aurora PostgreSQL and Amazon RDS for PostgreSQL environments. They are intended for advanced users who want to make the most of this exciting feature. These techniques are not necessary for general usage of Federated Query. This post discusses 10 best practices to help you maximize the benefits of Federated Query when you have large federated data sets, when your federated queries retrieve large volumes of data, or when you have many Redshift users accessing federated data sets. For more information about the benefits of Federated Query, see Build a Simplified ETL and Live Data Query Solution using Amazon Redshift Federated Query. Previously, you needed to extract data from your PostgreSQL database to Amazon Simple Storage Service (Amazon S3) and load it to Amazon Redshift using COPY or query it from Amazon S3 with Amazon Redshift Spectrum. You can now connect live data sources directly in Amazon Redshift to provide real-time reporting and analysis. For more information about setting up an environment where you can try out Federated Query, see Accelerate Amazon Redshift Federated Query adoption with AWS CloudFormation.įederated Query enables real-time data integration and simplified ETL processing. Amazon Redshift Federated Query enables you to use the analytic power of Amazon Redshift to directly query data stored in Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL databases.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |