Outshift Logo

PRODUCT

5 min read

Blog thumbnail
Published on 06/20/2021
Last updated on 02/09/2024

Access S3 Data in Amazon Redshift using Redshift Spectrum

Share

As the cloud data warehouse in AWS, Amazon Redshift provides seamless integration with other storages, such as Amazon S3. It enables a very cost-effective data warehouse solution, where the user can keep warm data in Amazon Redshift storage and cold data in S3 storage. The user can access the S3 data from Redshift in the same way, retrieving data from the Redshift storage itself. This blog post explains the process for doing just that.

Step 1: Pre-Requisite

You need to have an AWS account with administrative access.

Step 2: Create S3 Bucket and Upload Sample .csv File

Create S3 Bucket and Upload Sample .csv File In this example, I uploaded a .csv file with data about specific countries. The countrydata.csv file looks like this: countrydata.csv

Step 3: Create IAM Role

Your IAM Role for the Redshift cluster will be used to provide access to the data in the S3 bucket. Under Create Role in the IAM console, select AWS service. Under Choose a use case, select Redshift. Create IAM Role Next, under Select your use case, choose Redshift — Customizable. Select your use case, choose Redshift — Customizable. Under Attach permissions policies select PowerUserAccess. Attach permissions policies Next, create a role name and description. create a role At the end of all four steps, the summary page will show the details of the role created. summary page

Step 4: Launch Redshift Cluster

In this step, you’ll launch a Redshift cluster, which will be used to query the S3-based data. Go to Redshift Management Console and click on the Create cluster button. Launch Redshift Cluster Select an identifier for the cluster and select Free trial. cluster and select Free trial
Note: Free trial (dc2.large) doesn’t allow cross-dBs queries. For initiating cross-dBs queries, the minimum requirement is to use paid instances of the ra3.xlplus node type.
Keep default database configurations, using a master username and master user password. database configurations You’ll now see the cluster you’ve created. cluster Associate cluster with the role created in step 3. role created in step 3 You’ll see a summary of what you’ve created in the view shown below. summary of what you’ve created On the left-hand menu, select Editor using the options in the screenshot below to connect your cluster to the database you just created. Editor using the options![](/assets/blog/access-s3-data-in-amazon-redshift-using-redshift-spectrum/1-N0mE6uwIyjQFDdC29JwTfg.png)

Step 5: Insert Data in Redshift Database

In this step, you’ll create a new schema in the Redshift cluster database and then create a table in the schema using the S3-based data. This table will be used to access data from the S3 bucket. Insert Data in Redshift Database![](/assets/blog/access-s3-data-in-amazon-redshift-using-redshift-spectrum/1-4fQ2z-Tzc6XRRWXntHLfRw.png) In the Editor view, enter the following code in the query editor:
create external schema trellisdataschema
from data catalog
database ‘dev’
iam_role ‘arn:aws:iam::YYYYYYYYYYYY:role/trellis-yyyyyy-role’
create external database if not exists;
Insert Data in Redshift Database_2 Next, enter code to create a table for that schema using the data from your .csv file stored in the S3 bucket:
create external table trellisdataschema.countrycapitals(

country nvarchar(100),

capital nvarchar(100)

)

row format delimited

fields terminated by ‘,’

stored as textfile

location ‘s3://trellis-redshift-s3’;
Amazon Redshift_ Query editor The newly-created schema and table will now show up in the Resources view: Amazon Redshift_Query editor

Now whenever you run a SQL query, you’ll fetch data from the table that you created, which fetches from the S3 bucket:

Amazon Redshift_Query editorRows returned

Running Cross-Database Queries

In order to execute a cross-database query, you need to upgrade your node type to ra3.xlplus: Running Cross-Database Queries Next, create the database dev1 : create the database dev1 Now create the table — population: Now create the table — population Create a respective schema for db ‘dev1’: Create a respective schema for db ‘dev1’ Insert data into this table: Insert data into this table Run the query to display your inserted data. Run the query to display your inserted data Create a second database and table, following the steps above. Create the database dev2: Create the database dev2 Create the table currency: Create the table currency: Create respective schema for db ‘dev2’: Create respective schema for db ‘dev2’ Insert your data: Insert your data Run your query to display your data: Run your query to display your data Now that we have three DBs created (dev, dev1, and dev2) with their respective schemas in Redshift, we can run cross-database queries between these three databases: respective schemas in Redshift respective schemas in Redshift_2
Subscribe card background
Subscribe
Subscribe to
the Shift!

Get emerging insights on emerging technology straight to your inbox.

Unlocking Multi-Cloud Security: Panoptica's Graph-Based Approach

Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.

thumbnail
I
Subscribe
Subscribe
 to
the Shift
!
Get
emerging insights
on emerging technology straight to your inbox.

The Shift keeps you at the forefront of cloud native modern applications, application security, generative AI, quantum computing, and other groundbreaking innovations that are shaping the future of technology.

Outshift Background