Working in Amazon Redshift, I got the following error:
ERROR: Query unsupported due to an internal error.
DETAIL: Unsupported PartiQL correlated subquery
CONTEXT: nested_decorrelate_validate|
According to the documentation: "To unnest queries, Amazon Redshift uses the PartiQL syntax to iterate over SUPER arrays."
PartiQL gives an example of this unnesting using LEFT JOIN
.
So, why am I getting this error?
Here's a minimal reproduction case, to demonstrate:
create table unnest_error_demo (
id bigint,
data super
);
insert into unnest_error_demo (id, data)
values
(1, json_parse('[{"key":"foo","value":"aaa"},{"key":"bar","value":"bbb"},{"key":"baz","value":"ccc"}]')),
(2, json_parse('[{"key":"quux","value":"ddd"}]'));
select id,
foo.value as foo_value,
bar.value as bar_value,
baz.value as baz_value,
quux.value as quux_value
from unnest_error_demo as t
left join (select x.key, x.value::varchar from t.data as x where x.key::varchar = 'foo') as foo on true
left join (select x.key, x.value::varchar from t.data as x where x.key::varchar = 'bar') as bar on true
left join (select x.key, x.value::varchar from t.data as x where x.key::varchar = 'baz') as baz on true
left join (select x.key, x.value::varchar from t.data as x where x.key::varchar = 'quux') as quux on true;
This will get you the error I showed earlier:
ERROR: Query unsupported due to an internal error.
DETAIL: Unsupported PartiQL correlated subquery
CONTEXT: nested_decorrelate_validate|
I tried removing the subquery, like this:
select id,
foo.value::varchar as foo_value,
bar.value::varchar as bar_value,
baz.value::varchar as baz_value,
quux.value::varchar as quux_value
from unnest_error_demo as t
left join t.data as foo on true
left join t.data as bar on true
left join t.data as baz on true
left join t.data as quux on true
where foo.key::varchar = 'foo'
and bar.key::varchar = 'bar'
and baz.key::varchar = 'baz'
and quux.key::varchar = 'quux';
While this query will run, it returns no rows, because there are rows where the array-of-structs in data
do not contain a struct with every key. This could have worked if we could specify the condition in the ON
clause of the LEFT JOIN
, but that's prohibited. If you try, you'll get the following error:
select id,
foo.value::varchar as foo_value
from unnest_error_demo as t
left join t.data as foo on foo.key::varchar = 'foo';
ERROR: invalid join condition for SUPER unnest join
DETAIL: the valid join condition for SUPER unnest join is 'ON TRUE' only
There's one more possibility: a scalar subquery.
select id,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'foo') as foo_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'bar') as bar_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'baz') as baz_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'quux') as quux_value
from unnest_error_demo as t;
Hey! This actually worked:
id | foo_value | bar_value | baz_value | quux_value
----+-----------+-----------+-----------+------------
1 | aaa | bbb | ccc |
2 | | | | ddd
(2 rows)
... but, will it really, in practice?
It worked for this simple contrived demo with only 4 subqueries, but what about with real data, where the array-of-structs has some 30-40 elements? Will it still work?
Let's see:
select id,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'foo') as foo_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'bar') as bar_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'baz') as baz_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'quux') as quux_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_5') as some_5_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_6') as some_6_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_7') as some_7_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_8') as some_8_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_9') as some_9_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_10') as some_10_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_11') as some_11_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_12') as some_12_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_13') as some_13_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_14') as some_14_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_15') as some_15_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_16') as some_16_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_17') as some_17_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_18') as some_18_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_19') as some_19_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_20') as some_20_value,
(select max(x.value::varchar) from t.data as x where x.key::varchar = 'some_21') as some_21_value
from unnest_error_demo as t;
And ...
ERROR: The query is too complex to plan. Please consider rewriting it. Context: MessageContext, size: 15586125132, 1
😲
Excuse me? Look at that query, it's actually quite simple. But, imagine working on a real-world query that really is complex, and adding scalar subqueries to it, everything working just fine ... until 💣 you add one too many, and get this unhelpful, cryptic message.
"Please consider rewriting it."
Like, is this the database politely telling me "your query sucks, and you should feel ashamed for asking me to execute it"?
How about, "Hey, database engine, please consider trying just a little harder, this query isn't actually that complex."
If you're wondering, yes, the query worked fine all the way to 20 scalar subqueries. Only when I added the 21st, did it start refusing to execute the query and instead return the above error.
So, yeah, this approach works for the simplest of use cases, but any real world situation where you need more than 20 subqueries? No such luck.
At this point, I imagine you are probably saying something like, "Come on, you are manually unnesting when you could use PIVOT
!"
Look at me. Do I look like the kind of person who didn't fall down this rabbit hole starting out by trying PIVOT
only to realize that was a non-starter and decided to try and manually unnest as a last resort?
select t.id,
p.*
from unnest_error_demo as t,
(
select x.key::varchar,
x.value::varchar
from unnest_error_demo as t,
t.data as x
) pivot (
max(value)
for key in (
'foo' as foo_value,
'bar' as bar_value,
'baz' as baz_value,
'quux' as quux_value
)
) as p;
id | foo | bar | baz | quux
----+-----+-----+-----+------
2 | | | | ddd
1 | aaa | bbb | ccc |
(2 rows)
Okay, I'm gonna have to channel the spirit of Shaun Umscheid, the "What? Nooo Waaay" Guy and say it, myself: What? Nooo Waaay.
I swear, I tried it myself, while working on the actual query I was working on that started me on this adventure, and got some "you can't use PIVOT here" error, but in this simple demo query, it actually worked!
I'm going to have to go back and double-check my real query and my PIVOT
attempt to see if there's something special about my real schema and data and query that prohibits PIVOT
where this simple example allows it.
Let this be a lesson to you all: sometimes, creating a simpler version of a problem you're trying to solve and figuring that out lets you apply that solution to the actual complex problem you're really working on.
"Please consider rewriting it."
Okay, fine, Redshift, maybe this was good advice after all. 😳
While I cannot share my actual query, here is the error it results in:
ERROR: Query unsupported due to an internal error.
DETAIL: Unsupported witness case
CONTEXT: nested_decorrelate_calc_witness_unsupported|calc_witness
Searching for this error message turns up only one Stack Overflow post from January 2022, but in that post the query in question wasn't even trying to PIVOT
, but was trying to unnest a SUPER
column value.
A-ha! The PIVOT
query I wrote earlier was NOT actually correct, and it only coincidentally spat out the correct result as a sheer coincidence. Running that same query again now yields:
select t.id,
p.*
from unnest_error_demo as t,
(
select x.key::varchar,
x.value::varchar
from unnest_error_demo as t,
t.data as x
) pivot (
max(value)
for key in (
'foo' as foo_value,
'bar' as bar_value,
'baz' as baz_value,
'quux' as quux_value
)
) as p;
id | foo_value | bar_value | baz_value | quux_value
----+-----------+-----------+-----------+------------
1 | aaa | bbb | ccc | ddd
2 | aaa | bbb | ccc | ddd
(2 rows)
This is not the result we are looking for, but completely possible because the query is wrong. In order for the query to provide the correct response, it would have to be something like this:
select t.id,
p.*
from unnest_error_demo as t,
(
select x.key::varchar,
x.value::varchar
from t.data as x
) pivot (
max(value)
for key in (
'foo' as foo_value,
'bar' as bar_value,
'baz' as baz_value,
'quux' as quux_value
)
) as p;
And, guess what this query results in? Yup:
ERROR: Query unsupported due to an internal error.
DETAIL: Unsupported witness case
CONTEXT: nested_decorrelate_calc_witness_unsupported|calc_witness
Is all hope lost? Maybe not. What if we had a unique ID that we could use to join the parent table with the result of the pivot?
select t.id,
p.*
from unnest_error_demo as t
left join (
select t.id,
x.key::varchar,
x.value::varchar
from unnest_error_demo as t,
t.data as x
) pivot (
max(value)
for key in (
'foo' as foo_value,
'bar' as bar_value,
'baz' as baz_value,
'quux' as quux_value
)
) as p
on p.id = t.id;
id | id | foo_value | bar_value | baz_value | quux_value
----+----+-----------+-----------+-----------+------------
1 | 1 | aaa | bbb | ccc |
2 | 2 | | | | ddd
(2 rows)
A little messy with the duplicate id
column, because of our use of p.*
to lazily get all of the pivoted columns, but in our real code we certainly won't use p.*
, so maybe this is an acceptable solution.
I suppose the real question is, could I get away with this:
select *
from (
select t.id,
x.key::varchar,
x.value::varchar
from unnest_error_demo as t,
t.data as x
) pivot (
max(value)
for key in (
'foo' as foo_value,
'bar' as bar_value,
'baz' as baz_value,
'quux' as quux_value
)
) as p;
That would produce the correct results for this simple example, but in my real code, I have multiple separate SUPER
columns, each of which I need to pivot or unnest. Can I nest PIVOT
s?
create table unnest_error_demo_part_2 (
id bigint,
data super,
more_data super
);
insert into unnest_error_demo_part_2 (id, data, more_data)
values
(
1,
json_parse('[{"key":"foo","value":"aaa"},{"key":"bar","value":"bbb"},{"key":"baz","value":"ccc"}]'),
json_parse('[]')
),
(
2,
json_parse('[{"key":"quux","value":"ddd"}]'),
json_parse('[{"key":"never","value":"gonna"},{"key":"give","value":"you"},{"key":"up","value":null}]')
);
select *
from (
select p.id,
p.foo_value,
p.bar_value,
p.baz_value,
p.quux_value,
x.key::varchar,
x.value::varchar
from (
select t.id,
t.more_data,
x.key::varchar,
x.value::varchar
from unnest_error_demo_part_2 as t,
t.data as x
) pivot (
max(value)
for key in (
'foo' as foo_value,
'bar' as bar_value,
'baz' as baz_value,
'quux' as quux_value
)
) as p,
p.more_data as x
) pivot (
max(value)
for key in (
'never',
'give',
'up'
)
) as p_more;
id | foo_value | bar_value | baz_value | quux_value | never | give | up
----+-----------+-----------+-----------+------------+-------+------+----
2 | | | | ddd | gonna | you |
(1 row)
Oh, so close! It didn't include the row with id 1
, probably because there was nothing to unnest in more_data
for it.
Let's see if we can use a LEFT JOIN
and a null
key to get that first row in our results:
select *
from (
select p.id,
p.foo_value,
p.bar_value,
p.baz_value,
p.quux_value,
x.key::varchar,
x.value::varchar
from (
select t.id,
t.more_data,
x.key::varchar,
x.value::varchar
from unnest_error_demo_part_2 as t
left join t.data as x on true
) pivot (
max(value)
for key in (
'foo' as foo_value,
'bar' as bar_value,
'baz' as baz_value,
'quux' as quux_value
)
) as p
left join p.more_data as x on true
) pivot (
max(value)
for key in (
'never',
'give',
'up',
null as everything_else
)
) as p_more;
id | foo_value | bar_value | baz_value | quux_value | never | give | up | everything_else
----+-----------+-----------+-----------+------------+-------+------+----+-----------------
1 | aaa | bbb | ccc | | | | |
2 | | | | ddd | gonna | you | |
(2 rows)
There we go!
Okay, but what if more_data
wasn't an empty array in row 1? Does this still work?
update unnest_error_demo_part_2
set more_data = json_parse('[{"key":"jenny","value":"8675309"}]')
where id = 1;
And, executing the pivot query from before, we get:
id | foo_value | bar_value | baz_value | quux_value | never | give | up | everything_else
----+-----------+-----------+-----------+------------+-------+------+----+-----------------
2 | | | | ddd | gonna | you | |
(1 row)
Damn. Row 1 disappears again. If we add the key that's in more_data
in row 1, we should get the row back:
select *
from (
select p.id,
p.foo_value,
p.bar_value,
p.baz_value,
p.quux_value,
x.key::varchar,
x.value::varchar
from (
select t.id,
t.more_data,
x.key::varchar,
x.value::varchar
from unnest_error_demo_part_2 as t
left join t.data as x on true
) pivot (
max(value)
for key in (
'foo' as foo_value,
'bar' as bar_value,
'baz' as baz_value,
'quux' as quux_value
)
) as p
left join p.more_data as x on true
) pivot (
max(value)
for key in (
'never',
'give',
'up',
'jenny',
null as everything_else
)
) as p_more;
And, sure enough:
id | foo_value | bar_value | baz_value | quux_value | never | give | up | jenny | everything_else
----+-----------+-----------+-----------+------------+-------+------+----+---------+-----------------
1 | aaa | bbb | ccc | | | | | 8675309 |
2 | | | | ddd | gonna | you | | |
(2 rows)
This means that we need to be very careful in how we construct our PIVOT
statement, remembering to LEFT JOIN
the SUPER
value, and that the FOR ... IN (...)
needs to include at least one match for the row to be included in the resulting output.
In the end, this is all very tricky and easy to get wrong or wind up with a query that executes but produces the wrong output. And, multiple pivots gets messy real quick, but might be more manageable with Common Table Expressions (CTEs).
To show what the query could look like using CTEs, here's an example:
with a as (
select t.id,
t.more_data,
x.key::varchar,
x.value::varchar
from unnest_error_demo_part_2 as t
left join t.data as x on true
),
b as (
select p.id,
p.foo_value,
p.bar_value,
p.baz_value,
p.quux_value,
x.key::varchar,
x.value::varchar
from a pivot (
max(value)
for key in (
'foo' as foo_value,
'bar' as bar_value,
'baz' as baz_value,
'quux' as quux_value
)
) as p
left join p.more_data as x on true
),
c as (
select *
from b pivot (
max(value)
for key in (
'never',
'give',
'up',
'jenny',
null as everything_else
)
)
)
select * from c;
This does make me wonder if this could be simplified like this:
select *
from (
select t.id,
t.more_data,
x.key::varchar as x_key,
x.value::varchar as x_value,
y.key::varchar as y_key,
y.value::varchar as y_value
from unnest_error_demo_part_2 as t
left join t.data as x on true
left join t.more_data as y on true
) pivot (
max(x_value)
for x_key in (
'foo' as foo_value,
'bar' as bar_value,
'baz' as baz_value,
'quux' as quux_value
)
) pivot (
max(y_value)
for y_key in (
'never',
'give',
'up',
'jenny',
null as everything_else
)
);
Unfortunately, no:
ERROR: PIVOT cannot be applied to a PIVOT.
That would have been clean.
Anyway, hopefully if you came here looking for this kind of information because you're struggling with this problem, I hope you found this helpful. And, if you found an even better solution, please share it with me, I'd love to learn a better way of solving this.