Analyze JSON knowledge effectively with Amazon Redshift SUPER

February 1, 2026

6

Amazon Redshift is a quick, scalable, and totally managed cloud knowledge warehouse that you should use to course of and run your advanced SQL analytics workloads on structured and semi-structured knowledge. It additionally helps you securely entry your knowledge in operational databases, knowledge lakes, or third-party datasets with minimal motion or copying of knowledge. Tens of 1000’s of consumers use Amazon Redshift to course of giant quantities of knowledge, modernize their knowledge analytics workloads, and supply insights for his or her enterprise customers.

Amazon Redshift transforms how organizations analyze JSON knowledge by combining the analytical energy of a columnar knowledge warehouse with strong JSON processing capabilities. Through the use of Amazon Redshift SUPER datatype, you may effectively retailer, question, and analyze advanced hierarchical knowledge alongside conventional structured knowledge with out sacrificing efficiency.

This publish focuses on JSON options of Amazon Redshift.

Amazon Redshift structure

The Amazon Redshift structure is designed as a massively parallel processing (MPP) knowledge warehouse system that effectively handles on-line analytics processing (OLAP) workloads. At its core is a cluster consisting of a pacesetter node and compute nodes. The chief node manages consumer communications, develops execution plans, and coordinates question processing, whereas compute nodes deal with the precise knowledge processing in parallel. This MPP structure significantly advantages OLAP operations by distributing knowledge processing throughout a number of nodes and executing queries in parallel. The system is optimized for advanced analytical queries and reporting on giant datasets, slightly than conventional on-line transaction processing (OLTP) operations. Key options embody computerized knowledge compression, seamless integration with numerous AWS providers like Amazon Easy Storage Service (Amazon S3) and AWS Glue for knowledge lake analytics, concurrency scaling to deal with fluctuating workloads, and a serverless possibility for simplified administration and auto-scaling, alongside strong safety measures together with encryption and community isolation.

JSON doc capabilities in Amazon Redshift

The Amazon Redshift SUPER datatype presents a number of key benefits for working with JSON knowledge:

Direct storage of JSON paperwork with advanced hierarchical constructions with out transformation overhead
Schema flexibility for evolving JSON paperwork, so you may retailer and question semi-structured knowledge with out requiring cumbersome database migrations for schema adjustments throughout the JSON construction
SQL-compatible querying with PartiQL, an intuitive syntax for semi-structured knowledge

Amazon Redshift excels at analytical processing of JSON knowledge at scale, proving preferrred for advanced aggregations, joins, and analytical queries throughout huge JSON doc volumes. To unlock its full potential and obtain optimum efficiency with JSON knowledge, adherence to Amazon Redshift greatest practices and to SUPER datatype greatest practices is essential.

Actual-world instance: Retail buyer interplay evaluation

Think about constructing a retail analytics system that captures and analyzes buyer interactions throughout a number of channels. This use case demonstrates the SUPER datatype capabilities, as a result of buyer interactions generate semi-structured knowledge with various attributes relying on the channel, machine, and motion sorts.

First, we create a desk to retailer buyer interactions from numerous channels. As an alternative of making dozens of columns to accommodate interplay attributes, we’ll use a single SUPER column that may retailer the JSON construction:

CREATE TABLE customer_interactions (
    customer_id INT,
    interaction_date TIMESTAMP,
    interaction_data SUPER
);

This primary schema gives huge flexibility. Because the retail enterprise evolves, including new channels, monitoring new metrics, or capturing new machine attributes, you received’t want to change the database schema. That is significantly useful for retail analytics methods that must adapt rapidly to altering enterprise necessities.

Let’s insert a pattern buyer interplay from a cell app. This JSON doc captures a wealthy interplay the place a buyer seen a product after which added it to their cart:

-- Insert a cell app interplay with product view and cart addition
INSERT INTO customer_interactions VALUES
(1001, '2023-09-15 14:32:45',
 JSON_PARSE('{
    "channel": "mobile_app",
    "actions": [
      {"type": "view_product", "product_id": "P-5432", "duration_sec": 45},
      {"type": "add_to_cart",  "product_id": "P-5432", "quantity": 1}
    ],
    "device_info": {"os": "iOS", "model": "15.4.1", "mannequin": "iPhone 13"}
 }')),
(1002, '2023-09-15 15:10:12',
 JSON_PARSE('{
    "channel": "internet",
    "actions": [
      {"type": "view_product", "product_id": "P-1234", "duration_sec": 20},
      {"type": "checkout",     "product_id": "P-1234", "payment_method": "credit_card"}
    ],
    "device_info": {"os": "Home windows", "model": "10", "mannequin": "Dell XPS"}
 }')),
(1003, '2023-09-16 09:22:33',
 JSON_PARSE('{
    "channel": "mobile_app",
    "actions": [
      {"type": "view_product", "product_id": "P-9999", "duration_sec": 60},
      {"type": "add_to_cart",  "product_id": "P-9999", "quantity": 2},
      {"type": "checkout",     "product_id": "P-9999", "payment_method": "paypal"}
    ],
    "device_info": {"os": "Android", "model": "11", "mannequin": "Pixel 5"}
 }')),
(1004, '2023-09-16 15:45:30',
 JSON_PARSE('{
    "channel": "mobile_app",
    "actions": [
      {"type": "store_locator", "timestamp": "2023-09-16T15:45:32Z"}
    ],
    "device_info": {"os": "Android", "model": "12", "mannequin": "Samsung Galaxy S21"},
    "location": {
      "latitude": 37.7749, "longitude": -122.4194,
      "accuracy_meters": 10, "captured_at": "2023-09-16T15:45:30Z"
    }
 }'));

This JSON construction comprises useful data for retail analytics:

The shopper’s buying channel (cell app)
A sequence of actions they took (viewing a product, then including it to cart)
Particulars concerning the product they interacted with
Details about their machine

In a standard relational mannequin, this may require joins throughout a number of tables (interactions, actions, units). With JSON in Amazon Redshift, all associated knowledge stays collectively in a single row.

Superior JSON querying in Amazon Redshift

The SUPER datatype presents a number of key options for working with JSON knowledge. To discover these options, see these examples. Let’s talk about a few of these options within the context of a retail analytics platform.

Every of those examples showcase how Amazon Redshift, with the SUPER datatype, mixed with PartiQL navigation, treats JSON paperwork as normal SQL constructs, enabling highly effective, versatile evaluation of JSON-driven retail occasions.

Depend of actions by product

This question tallies the variety of occasions every product was seen, added to cart, and checked out throughout all buyer periods.

SELECT
  motion.product_id::varchar AS product_id,
  SUM(CASE WHEN motion.kind::varchar="view_product" THEN 1 ELSE 0 END) AS views,
  SUM(CASE WHEN motion.kind::varchar="add_to_cart" THEN 1 ELSE 0 END) AS add_to_cart,
  SUM(CASE WHEN motion.kind::varchar="checkout" THEN 1 ELSE 0 END) AS checkouts
FROM customer_interactions c
CROSS JOIN c.interaction_data.actions AS motion
GROUP BY 1
ORDER BY 1;

Output:

product_id	add_to_cart	checkouts
P-1234	0	1
P-5432	1	0
P-9999	1	1

This question

Makes use of CROSS JOIN to flatten a JSON array (c.interaction_data.actions AS motion).
Unifies relational aggregation (SUM and GROUP BY) with JSON navigation to pivot consumer occasions.

Excessive-engagement periods

This question identifies clients whose complete product viewing time exceeds 30 seconds by summing up the length of all view_product actions.

SELECT
  c.customer_id,
  SUM(CASE WHEN motion.kind::varchar="view_product"
           THEN motion.duration_sec::int ELSE 0 END) AS total_view_time
FROM customer_interactions c
CROSS JOIN c.interaction_data.actions AS motion
GROUP BY c.customer_id
HAVING SUM(CASE WHEN motion.kind::varchar="view_product"
                THEN motion.duration_sec::int ELSE 0 END) > 30
ORDER BY c.customer_id;

Output:

customer_id	total_view_time
1001	45
1003	60

On this question, JSON fields are used immediately in aggregations (motion.duration_sec).

Gadget mannequin extraction

This question pulls the machine mannequin from the nested device_info object, supporting device-based segmentation.

SELECT
  customer_id,
  interaction_data.device_info.mannequin::varchar AS device_model
FROM customer_interactions;

Output:

customer_id	device_model
1001	iphone 13
1002	Dell XPS
1003	Pixel 5
1004	Samsung Galaxy S21

On this use case, native dot-notation JSON navigation on a SUPER column (interaction_data.device_info.mannequin) allows direct in-line extraction of nested textual content values with out casting or parsing capabilities.

Filtering by working system

This question makes use of a PartiQL WHERE clause to filter periods to return solely these from Android units, enabling OS-specific evaluation.

SELECT customer_id
FROM customer_interactions
WHERE interaction_data.device_info.os="Android";

Output:

On this question, JSON attributes take part immediately in WHERE filters. You may section knowledge by JSON attributes as in the event that they have been relational columns.

Extracting second motion particulars

This question retrieves the kind, product ID, and amount of the second motion in every session, helpful for analyzing follow-up behaviors.

SELECT
  c.customer_id,
  motion.kind::varchar       AS second_action_type,
  motion.product_id::varchar AS second_action_product,
  motion.amount::int       AS amount
FROM customer_interactions c,
     c.interaction_data.actions AS motion AT idx
WHERE idx = 1
ORDER BY c.customer_id;

Output:

customer_id	second_action_type	second_action_product	amount
1001	add_to_cart	P-5432	1
1002	checkout	P-1234	(null)
1003	add_to_cart	P-9999	2

On this question, PartiQL’s array indexing (AT idx) lets queries goal particular JSON array components. This offers Amazon Redshift parity with JSON doc databases the place array traversal is a key characteristic.

Pivoting buyer actions

This question pivots counts of every motion kind into separate columns for a concise per-customer abstract. It exhibits easy methods to convert document-style occasions into relational-style summaries immediately. JSON flexibility is preserved whereas producing structured aggregates.

SELECT
  c.customer_id,
  SUM(CASE WHEN motion.kind::varchar="view_product" THEN 1 ELSE 0 END) AS view_product,
  SUM(CASE WHEN motion.kind::varchar="add_to_cart" THEN 1 ELSE 0 END) AS add_to_cart,
  SUM(CASE WHEN motion.kind::varchar="checkout" THEN 1 ELSE 0 END) AS checkout,
  SUM(CASE WHEN motion.kind::varchar="store_locator" THEN 1 ELSE 0 END) AS store_locator
FROM customer_interactions c
CROSS JOIN c.interaction_data.actions AS motion
GROUP BY c.customer_id
ORDER BY c.customer_id;

Output:

customer_id	view_product	add_to_cart	checkout	store_locator
1001	1	1	0	0
1002	1	0	1	0
1003	1	1	1	0
1004	0	0	0	1

Cellular app view_product occasions

Flatten actions arrays and extract solely view_product occasions on cell. This instance flattens the actions array and filters to point out solely cell app product-view occasions, together with viewing durations.

SELECT
  c.customer_id,
  c.interaction_date,
  motion.product_id::varchar AS product_id,
  motion.duration_sec::int   AS duration_sec
FROM customer_interactions c
CROSS JOIN c.interaction_data.actions AS motion
WHERE motion.kind::varchar="view_product"
  AND c.interaction_data.channel="mobile_app"
ORDER BY c.customer_id;

Output:

customer_id	interaction_date	product_id	duration_sec
1001	2023-09-15 14:32:45	P-5432	45
1003	2023-09-16 09:22:33	P-9999	60

This question exhibits a multi-level JSON querying inside plain SQL; advanced filters mix JSON array attributes (motion.kind) with dad or mum object fields (channel).

Efficiency optimization for JSON in Amazon Redshift

Amazon Redshift implements a number of optimizations for the SUPER datatype which might be significantly useful for retail analytics:

Columnar storage: Columnar storage in Amazon Redshift considerably advantages the SUPER datatype by optimizing the storage and retrieval of semi-structured knowledge, although the SUPER datatype can retailer whole JSON paperwork inside a single column.
Zone maps: Zone maps will help skip blocks that don’t match question predicates on extracted values from the SUPER columns
Parallel processing: The MPP structure of Amazon Redshift distributes knowledge and question processing throughout a number of nodes, together with operations on SUPER knowledge, enabling high-performance queries on giant datasets.
Materialized views: Create materialized views over generally accessed JSON paths; they’ll pre-compute outcomes for continuously accessed JSON paths, additional enhancing efficiency.

For instance, take into account materialized view. For continuously accessed JSON paths, these views can considerably enhance efficiency. In our retail instance, we’d create a view for cell product views:

-- Create a materialized view to speed up cell buying evaluation
-- This improves efficiency for continuously run studies on cell engagement
CREATE MATERIALIZED VIEW mv_mobile_product_views AS
SELECT
  c.customer_id,
  c.interaction_date,
  motion.product_id::varchar AS product_id,
  c.interaction_data.device_info.mannequin::varchar AS device_model
FROM customer_interactions c
CROSS JOIN c.interaction_data.actions AS motion
WHERE motion.kind::varchar="view_product"
  AND c.interaction_data.channel="mobile_app";

This materialized view pre-extracts generally accessed JSON paths, changing them to a columnar format for sooner question efficiency. Retail analysts working studies on cell engagement will expertise considerably sooner question occasions in comparison with parsing the JSON construction repeatedly. Right here is an instance:

SELECT * FROM mv_mobile_product_views;

Output:

customer_id	interaction_date	product_id	device_model
1001	2023-09-15 14:32:45	P-5432	iPhone 13
1003	2023-09-16 09:22:33	P-9999	Pixel 5

Superior use instances

On this part, we talk about easy methods to take care of superior JSON makes use of instances when working with Amazon Redshift.

Geospatial knowledge in JSON

For retail companies, location knowledge is essential for understanding buyer buying patterns. Amazon Redshift helps geospatial queries via its GEOMETRY and GEOGRAPHY knowledge sorts. You may retailer location knowledge inside SUPER columns and extract it for spatial operations:

-- Retailer a buyer interplay with location knowledge from a retailer go to
INSERT INTO customer_interactions VALUES (
    1004, 
    '2023-09-16 15:45:30',
    '{ 
        "channel": "mobile_app", 
        "actions": [
            {"type": "store_locator", "timestamp": "2023-09-16T15:45:32Z"}
        ],
        "device_info": {
            "os": "Android",
            "model": "12",
            "mannequin": "Samsung Galaxy S21"
        },
        "location": {
            "latitude": 37.7749,
            "longitude": -122.4194,
            "accuracy_meters": 10,
            "captured_at": "2023-09-16T15:45:30Z"
        }
    }'
);
-- Extract location knowledge to search out clients close to particular shops
-- This allows location-based advertising and marketing campaigns
SELECT
  customer_id,
  interaction_date,
  interaction_data.channel::varchar AS channel,
  ST_GeomFromText('POINT(' ||
        interaction_data.location.longitude::varchar || ' ' ||
        interaction_data.location.latitude::varchar || ')') AS customer_location
FROM customer_interactions
WHERE interaction_data.location IS NOT NULL;

Output:

customer_id	interaction_date	channel	customer_location
1004	2023-09-16 15:45:30	mobile_app	POINT(-122.4194 37.7749)

Through the use of this functionality, retail companies can carry out location-based analytics, equivalent to figuring out clients who visited bodily shops, analyzing foot site visitors patterns, or concentrating on promotions to clients in particular geographic areas.

Advanced nested JSON evaluation

Retail companies usually want to research advanced buyer habits patterns throughout buy historical past. Amazon Redshift allows subtle evaluation of deeply nested JSON constructions. Let’s add a buyer and product tables and insert some knowledge.

CREATE TABLE clients (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name  VARCHAR(50)
);
CREATE TABLE merchandise (
    product_id VARCHAR(20) PRIMARY KEY,
    product_name VARCHAR(100),
    class VARCHAR(50)
);
INSERT INTO clients VALUES
(1001, 'Alice', 'Johnson'),
(1002, 'Bob', 'Smith'),
(1003, 'Carol', 'Taylor'),
(1004, 'David', 'Brown');
INSERT INTO merchandise VALUES
('P-5432', 'Crimson Sneakers', 'Footwear'),
('P-1234', 'Wi-fi Mouse', 'Electronics'),
('P-9999', 'Espresso Maker', 'Home equipment');

-- Analyze buying patterns of consumers over 30
-- This helps goal particular demographics with acceptable merchandise
SELECT ci.customer_id, cu.first_name, cu.last_name,
       p.product_name, p.class
FROM customer_interactions ci
JOIN clients cu ON ci.customer_id = cu.customer_id
LEFT JOIN merchandise  p ON p.product_id = ci.interaction_data.actions[0].product_id;

Output:

customer_id	first_name	last_name	product_name	class
1001	Alice	Johnson	Crimson Sneakers	Footwear
1002	Bob	Smith	Wi-fi Mouse	Electronics
1003	Carol	Taylor	Espresso Maker	Home equipment
1004	David	Brown	NULL	NULL

This question analyzes buying patterns of hottest merchandise bought by clients over 30 years previous, within the present 12 months. The power to navigate via nested buyer data, buy historical past, and merchandise particulars, gives highly effective insights for retail merchandising and advertising and marketing groups.

Time-series evaluation on JSON

Retail companies usually want to research time-based patterns, equivalent to buying habits throughout particular hours or seasonal traits. JSON knowledge usually comprises time-series data that may be analyzed effectively in Amazon Redshift. Let illustrate use case via a time-series IOT instance, utilizing the store_sensors desk.

CREATE TABLE store_sensors (
    store_id INT,
    readings SUPER
);
INSERT INTO store_sensors VALUES
(1, JSON_PARSE('[
  {"value": {"temperature": 22.5, "timestamp": "2023-03-01T10:00:00Z"}},
  {"value": {"temperature": 23.0, "timestamp": "2023-06-15T12:00:00Z"}},
  {"value": {"temperature": 21.8, "timestamp": "2023-11-20T09:30:00Z"}}
]')),
(2, JSON_PARSE('[
  {"value": {"temperature": 19.5, "timestamp": "2023-02-10T14:00:00Z"}},
  {"value": {"temperature": 20.0, "timestamp": "2023-07-25T16:00:00Z"}}
]'));

-- Analyze in-store temperature sensors to optimize buying surroundings
-- This helps keep optimum retailer circumstances for buyer consolation

SELECT
  s.store_id,
  COUNT(*) AS reading_count,
  AVG((studying.worth.temperature)::DOUBLE PRECISION) AS avg_temp
FROM store_sensors s
CROSS JOIN s.readings AS studying
WHERE (studying.worth.timestamp)::TIMESTAMP BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY s.store_id;

Output:

store_id	reading_count	avg_temp
1	3	69
2	2	69

This question analyzes retailer environmental circumstances, figuring out the typical temperature. For retail companies, sustaining optimum buying circumstances is essential for buyer satisfaction and might immediately influence gross sales. The power to retailer and analyze IoT sensor knowledge as JSON gives useful operational insights with out advanced knowledge transformations.

Limitations and issues

Whereas the SUPER datatype presents highly effective JSON capabilities for retail analytics, there are limitations that you simply want to pay attention to. For extra data, see the general public documentation on the limitations.

The SUPER datatype inherits two normal measurement limitations of Amazon Redshift. The whole measurement of a doc can not exceed 16MB, and 64 KB is the restrict for VARCHAR strings.

For retail companies with giant JSON paperwork, to take care of these limitations, take into account:

Extracting crucial fields: Use generated columns to extract continuously accessed JSON fields into devoted columns, equivalent to buyer tier.
Amazon Redshift Spectrum: Retailer giant JSON datasets in Amazon S3 and question them utilizing Amazon Redshift Spectrum. Notice that the scale restrict of 64 KB nonetheless applies to the exterior desk definition, however you can doubtlessly, pre-process the big objects into extra compact codecs and retailer them on Amazon S3. This strategy is scalable and avoids storing extraordinarily giant, non-relational knowledge inside your analytical database, bettering efficiency and stability.
Hybrid strategy: Retailer continuously accessed knowledge in Amazon Redshift tables and fewer continuously accessed knowledge in Amazon S3, querying each with federated queries. On this case, you be a part of exterior knowledge utilizing spectrum along with native knowledge in Amazon Redshift.

Is Amazon Redshift proper to your JSON workloads?

As highly effective Amazon Redshift is, it’s not at all times the precise match to your JSON workloads. On this part, we talk about when Amazon Redshift is and when it’s not a proper match.

Amazon Redshift is good for JSON workloads when:

Analytical processing is major: Your major use case entails advanced analytical queries, equivalent to buyer segmentation, product affinity evaluation, or gross sales development identification.
Blended knowledge sorts: You must analyze JSON alongside conventional structured knowledge, equivalent to combining buyer interplay knowledge with stock and gross sales data.
Knowledge warehouse integration: Your JSON knowledge must be a part of a broader knowledge warehousing technique, integrating buyer habits with enterprise knowledge.
BI software integration: You must join enterprise intelligence instruments to research JSON knowledge for government dashboards and studies.
Price-effective analytics at scale: You require cost-effective evaluation of enormous volumes of JSON knowledge, equivalent to years of buyer interactions or Web of Issues (IoT) sensor readings from a number of retailer areas.

Think about alternate options when:

Operational workloads dominate: Should you primarily want single-document lookups or updates, equivalent to retrieving or updating particular person buyer profiles, take into account Amazon DynamoDB or Amazon DocumentDB.
Superior search is crucial: If superior textual content search, together with full-text and semantic search, is your major requirement (equivalent to for looking out product critiques or buyer suggestions), Amazon OpenSearch Service may be extra acceptable.
Doc measurement exceeds limits: If you’re typically coping with very giant paperwork with complete measurement exceeding 16MB or paperwork with VARCHAR strings exceeding 64KB in measurement.

Conclusion

Amazon Redshift presents a strong platform for retail JSON analytics that mixes the pliability of semi-structured knowledge with the efficiency of a columnar knowledge warehouse. Through the use of the SUPER datatype and PartiQL, retail organizations can analyze advanced buyer interplay knowledge at scale with out sacrificing SQL compatibility or analytical capabilities.

As knowledge continues to develop in quantity and complexity, the flexibility to bridge the hole between structured and semi-structured knowledge evaluation turns into more and more useful. Through the use of Amazon Redshift to take away the normal boundary between these knowledge sorts, organizations can consolidate their analytical workloads on a single platform, lowering complexity, bettering efficiency, and accelerating time to perception for business-critical selections.

Attempt Amazon Redshift’s SUPER datatype and depart your feedback right here.

Analyze JSON knowledge effectively with Amazon Redshift SUPER

Amazon Redshift structure

JSON doc capabilities in Amazon Redshift

Actual-world instance: Retail buyer interplay evaluation

Superior JSON querying in Amazon Redshift

Depend of actions by product

Excessive-engagement periods

Gadget mannequin extraction

Filtering by working system

Extracting second motion particulars

Pivoting buyer actions

Cellular app view_product occasions

Efficiency optimization for JSON in Amazon Redshift

Superior use instances

Geospatial knowledge in JSON

Advanced nested JSON evaluation

Time-series evaluation on JSON

Limitations and issues

Is Amazon Redshift proper to your JSON workloads?

Conclusion

Concerning the authors

Related Articles

Democratizing enterprise intelligence: BGL’s journey with Claude Agent SDK and Amazon Bedrock AgentCore

Why Our Open Supply, Companies-Led Mannequin Nonetheless Works

GPTHuman vs HIX Bypass: AI Humanizer Showdown

LEAVE A REPLY Cancel reply

Latest Articles

Democratizing enterprise intelligence: BGL’s journey with Claude Agent SDK and Amazon Bedrock AgentCore

Why Our Open Supply, Companies-Led Mannequin Nonetheless Works

GPTHuman vs HIX Bypass: AI Humanizer Showdown

loish weblog

Lab-grown corticospinal neurons provide new fashions for ALS and spinal accidents – NanoApps Medical – Official web site