Outshift Logo

PRODUCT

5 min read

Blog thumbnail
Published on 07/20/2022
Last updated on 03/26/2024

Setup AQUA (Advanced Query Accelerator) for Amazon Redshift

Share

AQUA (Advanced Query Accelerator) is a new distributed and hardware-accelerated cache that enables Amazon Redshift to run up to ten times faster than other enterprise cloud data warehouses by automatically boosting certain types of queries. AQUA is included with the Redshift RA3 ra3.4xl and ra3.16xl node types at no additional cost. This post will describe how to set up AQUA for Amazon Redshift. Note that AQUA executes on data in “regular” Redshift tables — i.e. data that is ingested into Redshift. It doesn’t support federated queries run from Redshift against databases like maria DB or aurora DB. AQUA is designed to deliver up to 10 times the performance on queries that perform large scans, aggregates, and filtering, which is limited to LIKE and SIMILAR_TO predicates. Queries that perform writes, such as INSERT, UPDATE, DELETE, CREATE TABLE AS, COPY, and UNLOAD and SELECT (without a predicate) are not supported currently. Over time, AWS will be adding support for additional queries.

Launch Redshift Cluster

Go to Redshift Management Console and click on the “Create cluster” button. Select Production under “What are you planning to use this cluster for?” Then choose the RA3 ra3.4xl node type and turn on AQUA.
Setup_Aqua1
Keep default database configurations, using a master username and master user password.
Setup_Aqua2
Setup_Aqua3
You’ll now see the cluster you’ve created.
Setup_Aqua4

Comparing queries run in Redshift, Aurora DB with and without AQUA cluster.

To assess the improvement in performance from AQUA, let’s consider this largetest_2 table, which has 500,000,000 rows consisting of num1(int), num2 and num3 (double precision).
Setup_Aqua5
Setup_Aqua6
We ran the following two queries for selecting num1, average of num3 and sum of num3, on the data shown in the table above: Query #1
SELECT num1, avg(num3) as num3_avg, sum(num2) as num2_sum
FROM dev.public.largetest_2
WHERE num2 LIKE ‘0.2%’
GROUP BY num1;
Query #2
SELECT num1, avg(num3) as num3_avg, sum(num2) as num2_sum
FROM dev.public.largetest_2
WHERE num3 LIKE ‘0.2%’
GROUP BY num1;
The Query #1’s response time is 29s.
Setup_Aqua7
The Query #2’s response time is 1m 1s.
Setup_Aqua8
We ran the same Query #1 in a non-AQUA (see above table) and got a response time of 57s.
Setup_Aqua9
Query #2’s response time is 2m 8s
Setup_Aqua10
Next, we ran the same queries on the above table against Aurora DB in the Redshift AQUA cluster. Query #1's response time was 47m 51s.
Setup_Aqua11
We then ran a query on the above table against Aurora DB in the Redshift non-AQUA cluster. Query #1's response time was 47m 52s.
Setup_Aqua12
A Query explanation:
Setup_Aqua13
SELECT num1, sum(num2) as num2_sum, avg(num3) as num3_avg FROM largetest_3 GROUP BY num1;
The above query will select rows (highlighted in green in Table 1 above), group by num1 column to calculate sum for num2 and the average for num3 columns. In other words, when grouping by value ‘3’ (from num1), the query will calculate sum for 0+1=1(from num2) and the average for (77+85)/2=81(from num3). We will get the following response for this query:
Setup_Aqua14
SELECT num1, sum(num2) as num2_sum, avg(num3) as num3_avg
FROM largetest_3
WHERE num2 LIKE ‘1%’
GROUP BY num1;
The above query will select rows (highlighted in orange in Table 1 above), only where num2 has ‘1%’ — i.e. the query will only return values that start with ‘1’ — and group by the num1 column to calculate the sum for num2 and the average for num3 columns. In other words, when grouping by value ‘4’ (from num1), the query will calculate the sum for 1+1=2(from num2) and the average for (114+112)/2=113 (from num3). We will then get the following response for above query.
Setup_Aqua15

Response Time Summary for different configurations

The table below shows the comparison of queries run in Redshift (Local) and Aurora DB (Remote) with and without the AQUA cluster. These queries were run on a database table with 500,000,000 rows mix of integer and double precision data types.
Setup_Aqua16
The above table clearly shows the performance improvements in terms of response time with Redshift AQUA cluster as compared to non-AQUA cluster.

RedShift Cost Analysis

The cost of using Redshift can be broken into the following high-level categories:
  1. Redshift managed storage associated with nodes in a cluster — Pay for data stored in managed storage at a fixed GB-month rate for your region.
  2. Data scanned while running queries — With Redshift Spectrum, you are billed per terabyte of data scanned, rounded up to the next megabyte, with a 10-megabyte minimum per query.
  3. Redshift Compute Node Usage — hourly rate based on the type and number of nodes in your cluster.
Click here for AWS link for Redshift pricing: https://aws.amazon.com/redshift/pricing/
Subscribe card background
Subscribe
Subscribe to
the Shift!

Get emerging insights on emerging technology straight to your inbox.

Unlocking Multi-Cloud Security: Panoptica's Graph-Based Approach

Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.

thumbnail
I
Subscribe
Subscribe
 to
the Shift
!
Get
emerging insights
on emerging technology straight to your inbox.

The Shift keeps you at the forefront of cloud native modern applications, application security, generative AI, quantum computing, and other groundbreaking innovations that are shaping the future of technology.

Outshift Background