Currently working on copying schema and data across Redshift clusters.
我发现两种不同的有希望的方法,很想知道建议采用哪种方法/每条缺陷(如果我发现有的话):
- Use UNLOAD/COPY methods to unload data into S3 and then copy the data into a different Redshift cluster. This seems to be the most common method, although from what I can find this only works for tables, so would have to also copy over my schema into the new Redshift cluster and loop through the tables for this to work. This is a little more involved but seems to be more free-form and customizable to my needs.
- Using Redshift datasharing to share data between clusters and apply whatever obfuscation is necessary to the data using dynamic data masking. This has some major benefits in my eyes, namely (1) the data shared between clusters is always up to date and (2) it can all be handled within the AWS console. The main drawback I see is that I will have to upgrade clusters which will decently increase my costs. Any other drawbacks I am missing? (Side question: does anyone have experience using this method/pairing Redshift datasharing with dynamic data masking? Can t find anything online about anyone s experiences pairing the two.)
A top priority for me is that this is an easily reproducible method that can be used to copy data/schema whenever I need it. Thanks for any feedback/suggestions!