Trying to unnest `SUPER` data in Amazon Redshift

Working in Amazon Redshift, I got the following error:

ERROR:  Query unsupported due to an internal error.
DETAIL:  Unsupported PartiQL correlated subquery
CONTEXT:  nested_decorrelate_validate|

According to the documentation: "To unnest queries, Amazon Redshift uses the PartiQL syntax to iterate over SUPER arrays."

PartiQL gives an example of this unnesting using LEFT JOIN.

So, why am I getting this error?

Here's a minimal reproduction case, to demonstrate:

create table unnest_error_demo (
    id bigint,
    data super
);

insert into unnest_error_demo (id, data)
values
    (1, json_parse('[{"key":"foo","value":"aaa"},{"key":"bar","value":"bbb"},{"key":"baz","value":"ccc"}]')),
    (2, json_parse('[{"key":"quux","value":"ddd"}]'));

select  id,
        foo.value as foo_value,
        bar.value as bar_value,
        baz.value as baz_value,
        quux.value as quux_value
from    unnest_error_demo as t
        left join (select x.key, x.value::varchar from t.data as x where x.key::varchar = 'foo') as foo on true
        left join (select x.key, x.value::varchar from t.data as x where x.key::varchar = 'bar') as bar on true
        left join (select x.key, x.value::varchar from t.data as x where x.key::varchar = 'baz') as baz on true
        left join (select x.key, x.value::varchar from t.data as x where x.key::varchar = 'quux') as quux on true;

This will get you the error I showed earlier:

ERROR:  Query unsupported due to an internal error.
DETAIL:  Unsupported PartiQL correlated subquery
CONTEXT:  nested_decorrelate_validate|

I tried removing the subquery, like this:

select  id,
        foo.value::varchar as foo_value,
        bar.value::varchar as bar_value,
        baz.value::varchar as baz_value,
        quux.value::varchar as quux_value
from    unnest_error_demo as t
        left join t.data as foo on true
        left join t.data as bar on true
        left join t.data as baz on true
        left join t.data as quux on true
where   foo.key::varchar = 'foo'
and     bar.key::varchar = 'bar'
and     baz.key::varchar = 'baz'
and     quux.key::varchar = 'quux';

While this query will run, it returns no rows, because there are rows where the array-of-structs in data do not contain a struct with every key. This could have worked if we could specify the condition in the ON clause of the LEFT JOIN, but that's prohibited. If you try, you'll get the following error:

select  id,
        foo.value::varchar as foo_value
from    unnest_error_demo as t
        left join t.data as foo on foo.key::varchar = 'foo';

ERROR:  invalid join condition for SUPER unnest join
DETAIL:  the valid join condition for SUPER unnest join is 'ON TRUE' only

There's one more possibility: a scalar subquery.

select  id,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'foo') as foo_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'bar') as bar_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'baz') as baz_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'quux') as quux_value
from    unnest_error_demo as t;

Hey! This actually worked:

 id | foo_value | bar_value | baz_value | quux_value 
----+-----------+-----------+-----------+------------
  1 | aaa       | bbb       | ccc       | 
  2 |           |           |           | ddd
(2 rows)

... but, will it really, in practice?

It worked for this simple contrived demo with only 4 subqueries, but what about with real data, where the array-of-structs has some 30-40 elements? Will it still work?

Let's see:

select  id,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'foo') as foo_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'bar') as bar_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'baz') as baz_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'quux') as quux_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_5') as some_5_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_6') as some_6_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_7') as some_7_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_8') as some_8_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_9') as some_9_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_10') as some_10_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_11') as some_11_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_12') as some_12_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_13') as some_13_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_14') as some_14_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_15') as some_15_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_16') as some_16_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_17') as some_17_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_18') as some_18_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_19') as some_19_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_20') as some_20_value,
        (select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_21') as some_21_value
from    unnest_error_demo as t;

And ...

ERROR:  The query is too complex to plan. Please consider rewriting it. Context: MessageContext, size: 15586125132, 1

😲

Excuse me? Look at that query, it's actually quite simple. But, imagine working on a real-world query that really is complex, and adding scalar subqueries to it, everything working just fine ... until 💣 you add one too many, and get this unhelpful, cryptic message.

"Please consider rewriting it."

Like, is this the database politely telling me "your query sucks, and you should feel ashamed for asking me to execute it"?

How about, "Hey, database engine, please consider trying just a little harder, this query isn't actually that complex."

If you're wondering, yes, the query worked fine all the way to 20 scalar subqueries. Only when I added the 21st, did it start refusing to execute the query and instead return the above error.

So, yeah, this approach works for the simplest of use cases, but any real world situation where you need more than 20 subqueries? No such luck.

At this point, I imagine you are probably saying something like, "Come on, you are manually unnesting when you could use PIVOT!"

Look at me. Do I look like the kind of person who didn't fall down this rabbit hole starting out by trying PIVOT only to realize that was a non-starter and decided to try and manually unnest as a last resort?

select  t.id,
        p.*
from    unnest_error_demo as t,
        (
            select  x.key::varchar,
                    x.value::varchar
            from    unnest_error_demo as t,
                    t.data as x
        ) pivot (
            max(value)
            for key in (
                'foo' as foo_value,
                'bar' as bar_value,
                'baz' as baz_value,
                'quux' as quux_value
            )
        ) as p;

id | foo | bar | baz | quux 
----+-----+-----+-----+------
  2 |     |     |     | ddd
  1 | aaa | bbb | ccc | 
(2 rows)

Okay, I'm gonna have to channel the spirit of Shaun Umscheid, the "What? Nooo Waaay" Guy and say it, myself: What? Nooo Waaay.

I swear, I tried it myself, while working on the actual query I was working on that started me on this adventure, and got some "you can't use PIVOT here" error, but in this simple demo query, it actually worked!

I'm going to have to go back and double-check my real query and my PIVOT attempt to see if there's something special about my real schema and data and query that prohibits PIVOT where this simple example allows it.

Let this be a lesson to you all: sometimes, creating a simpler version of a problem you're trying to solve and figuring that out lets you apply that solution to the actual complex problem you're really working on.

"Please consider rewriting it."

Okay, fine, Redshift, maybe this was good advice after all. 😳

Update: I reproduced the error I got trying to `PIVOT` my actual query

While I cannot share my actual query, here is the error it results in:

ERROR:  Query unsupported due to an internal error.
DETAIL:  Unsupported witness case
CONTEXT:  nested_decorrelate_calc_witness_unsupported|calc_witness

Searching for this error message turns up only one Stack Overflow post from January 2022, but in that post the query in question wasn't even trying to PIVOT, but was trying to unnest a SUPER column value.

A-ha! The PIVOT query I wrote earlier was NOT actually correct, and it only coincidentally spat out the correct result as a sheer coincidence. Running that same query again now yields:

select  t.id,
        p.*
from    unnest_error_demo as t,
        (
            select  x.key::varchar,
                    x.value::varchar
            from    unnest_error_demo as t,
                    t.data as x
        ) pivot (
            max(value)
            for key in (
                'foo' as foo_value,
                'bar' as bar_value,
                'baz' as baz_value,
                'quux' as quux_value
            )
        ) as p;

 id | foo_value | bar_value | baz_value | quux_value 
----+-----------+-----------+-----------+------------
  1 | aaa       | bbb       | ccc       | ddd
  2 | aaa       | bbb       | ccc       | ddd
(2 rows)

This is not the result we are looking for, but completely possible because the query is wrong. In order for the query to provide the correct response, it would have to be something like this:

select  t.id,
        p.*
from    unnest_error_demo as t,
        (
            select  x.key::varchar,
                    x.value::varchar
            from    t.data as x
        ) pivot (
            max(value)
            for key in (
                'foo' as foo_value,
                'bar' as bar_value,
                'baz' as baz_value,
                'quux' as quux_value
            )
        ) as p;

And, guess what this query results in? Yup:

ERROR:  Query unsupported due to an internal error.
DETAIL:  Unsupported witness case
CONTEXT:  nested_decorrelate_calc_witness_unsupported|calc_witness

Is all hope lost? Maybe not. What if we had a unique ID that we could use to join the parent table with the result of the pivot?

select  t.id,
        p.*
from    unnest_error_demo as t
        left join (
            select  t.id,
                    x.key::varchar,
                    x.value::varchar
            from    unnest_error_demo as t,
                    t.data as x
        ) pivot (
            max(value)
            for key in (
                'foo' as foo_value,
                'bar' as bar_value,
                'baz' as baz_value,
                'quux' as quux_value
            )
        ) as p
        on p.id = t.id;

 id | id | foo_value | bar_value | baz_value | quux_value 
----+----+-----------+-----------+-----------+------------
  1 |  1 | aaa       | bbb       | ccc       | 
  2 |  2 |           |           |           | ddd
(2 rows)

A little messy with the duplicate id column, because of our use of p.* to lazily get all of the pivoted columns, but in our real code we certainly won't use p.*, so maybe this is an acceptable solution.

I suppose the real question is, could I get away with this:

select  *
from    (
            select  t.id,
                    x.key::varchar,
                    x.value::varchar
            from    unnest_error_demo as t,
                    t.data as x
        ) pivot (
            max(value)
            for key in (
                'foo' as foo_value,
                'bar' as bar_value,
                'baz' as baz_value,
                'quux' as quux_value
            )
        ) as p;

That would produce the correct results for this simple example, but in my real code, I have multiple separate SUPER columns, each of which I need to pivot or unnest. Can I nest PIVOTs?

create table unnest_error_demo_part_2 (
    id bigint,
    data super,
    more_data super
);

insert into unnest_error_demo_part_2 (id, data, more_data)
values
    (
        1,
        json_parse('[{"key":"foo","value":"aaa"},{"key":"bar","value":"bbb"},{"key":"baz","value":"ccc"}]'),
        json_parse('[]')
    ),
    (
        2,
        json_parse('[{"key":"quux","value":"ddd"}]'),
        json_parse('[{"key":"never","value":"gonna"},{"key":"give","value":"you"},{"key":"up","value":null}]')
    );
    
select  *
from    (
            select  p.id,
                    p.foo_value,
                    p.bar_value,
                    p.baz_value,
                    p.quux_value,
                    x.key::varchar,
                    x.value::varchar
            from    (
                        select  t.id,
                                t.more_data,
                                x.key::varchar,
                                x.value::varchar
                        from    unnest_error_demo_part_2 as t,
                                t.data as x
                    ) pivot (
                        max(value)
                        for key in (
                            'foo' as foo_value,
                            'bar' as bar_value,
                            'baz' as baz_value,
                            'quux' as quux_value
                        )
                    ) as p,
                    p.more_data as x
        ) pivot (
            max(value)
            for key in (
                'never',
                'give',
                'up'
            )
        ) as p_more;

 id | foo_value | bar_value | baz_value | quux_value | never | give | up 
----+-----------+-----------+-----------+------------+-------+------+----
  2 |           |           |           | ddd        | gonna | you  | 
(1 row)

Oh, so close! It didn't include the row with id 1, probably because there was nothing to unnest in more_data for it.

Let's see if we can use a LEFT JOIN and a null key to get that first row in our results:

select  *
from    (
            select  p.id,
                    p.foo_value,
                    p.bar_value,
                    p.baz_value,
                    p.quux_value,
                    x.key::varchar,
                    x.value::varchar
            from    (
                        select  t.id,
                                t.more_data,
                                x.key::varchar,
                                x.value::varchar
                        from    unnest_error_demo_part_2 as t
                                left join t.data as x on true
                    ) pivot (
                        max(value)
                        for key in (
                            'foo' as foo_value,
                            'bar' as bar_value,
                            'baz' as baz_value,
                            'quux' as quux_value
                        )
                    ) as p
                    left join p.more_data as x on true
        ) pivot (
            max(value)
            for key in (
                'never',
                'give',
                'up',
                null as everything_else
            )
        ) as p_more;

 id | foo_value | bar_value | baz_value | quux_value | never | give | up | everything_else 
----+-----------+-----------+-----------+------------+-------+------+----+-----------------
  1 | aaa       | bbb       | ccc       |            |       |      |    | 
  2 |           |           |           | ddd        | gonna | you  |    | 
(2 rows)

There we go!

Okay, but what if more_data wasn't an empty array in row 1? Does this still work?

update unnest_error_demo_part_2
set more_data = json_parse('[{"key":"jenny","value":"8675309"}]')
where id = 1;

And, executing the pivot query from before, we get:

 id | foo_value | bar_value | baz_value | quux_value | never | give | up | everything_else 
----+-----------+-----------+-----------+------------+-------+------+----+-----------------
  2 |           |           |           | ddd        | gonna | you  |    | 
(1 row)

Damn. Row 1 disappears again. If we add the key that's in more_data in row 1, we should get the row back:

select  *
from    (
            select  p.id,
                    p.foo_value,
                    p.bar_value,
                    p.baz_value,
                    p.quux_value,
                    x.key::varchar,
                    x.value::varchar
            from    (
                        select  t.id,
                                t.more_data,
                                x.key::varchar,
                                x.value::varchar
                        from    unnest_error_demo_part_2 as t
                                left join t.data as x on true
                    ) pivot (
                        max(value)
                        for key in (
                            'foo' as foo_value,
                            'bar' as bar_value,
                            'baz' as baz_value,
                            'quux' as quux_value
                        )
                    ) as p
                    left join p.more_data as x on true
        ) pivot (
            max(value)
            for key in (
                'never',
                'give',
                'up',
                'jenny',
                null as everything_else
            )
        ) as p_more;

And, sure enough:

 id | foo_value | bar_value | baz_value | quux_value | never | give | up |  jenny  | everything_else 
----+-----------+-----------+-----------+------------+-------+------+----+---------+-----------------
  1 | aaa       | bbb       | ccc       |            |       |      |    | 8675309 | 
  2 |           |           |           | ddd        | gonna | you  |    |         | 
(2 rows)

This means that we need to be very careful in how we construct our PIVOT statement, remembering to LEFT JOIN the SUPER value, and that the FOR ... IN (...) needs to include at least one match for the row to be included in the resulting output.

In the end, this is all very tricky and easy to get wrong or wind up with a query that executes but produces the wrong output. And, multiple pivots gets messy real quick, but might be more manageable with Common Table Expressions (CTEs).

To show what the query could look like using CTEs, here's an example:

with a as (
    select  t.id,
            t.more_data,
            x.key::varchar,
            x.value::varchar
    from    unnest_error_demo_part_2 as t
            left join t.data as x on true
),

b as (
    select  p.id,
            p.foo_value,
            p.bar_value,
            p.baz_value,
            p.quux_value,
            x.key::varchar,
            x.value::varchar
    from    a pivot (
                max(value)
                for key in (
                    'foo' as foo_value,
                    'bar' as bar_value,
                    'baz' as baz_value,
                    'quux' as quux_value
                )
            ) as p
            left join p.more_data as x on true
),

c as (
    select  *
    from    b pivot (
                max(value)
                for key in (
                    'never',
                    'give',
                    'up',
                    'jenny',
                    null as everything_else
                )
            )
)

select * from c;

This does make me wonder if this could be simplified like this:

select  *                                                                        
from    (                                                                        
            select  t.id,                                                        
                    t.more_data,                                                 
                    x.key::varchar as x_key,                                     
                    x.value::varchar as x_value,                                 
                    y.key::varchar as y_key,                                     
                    y.value::varchar as y_value                                  
            from    unnest_error_demo_part_2 as t                                
                    left join t.data as x on true                                
                    left join t.more_data as y on true                           
        ) pivot (                                                                
            max(x_value)                                                         
            for x_key in (                                                       
                'foo' as foo_value,                                              
                'bar' as bar_value,                                              
                'baz' as baz_value,                                              
                'quux' as quux_value                                             
            )                                                                    
        ) pivot (                                                                
            max(y_value)                                                         
            for y_key in (                                                       
                'never',                                                         
                'give',                                                          
                'up',                                                            
                'jenny',                                                         
                null as everything_else                                          
            )                                                                    
        );

Unfortunately, no:

ERROR:  PIVOT cannot be applied to a PIVOT.

That would have been clean.

Anyway, hopefully if you came here looking for this kind of information because you're struggling with this problem, I hope you found this helpful. And, if you found an even better solution, please share it with me, I'd love to learn a better way of solving this.

dossy/unnest_error_demo.md

Trying to unnest SUPER data in Amazon Redshift

Update: I reproduced the error I got trying to PIVOT my actual query

Trying to unnest `SUPER` data in Amazon Redshift

Update: I reproduced the error I got trying to `PIVOT` my actual query