Skip to content

Short syntax for UNLIST with IN operator#8878

Open
ChudaykinAlex wants to merge 2 commits intoFirebirdSQL:masterfrom
ChudaykinAlex:work/short_syntax_for_UNLIST_with_IN_operator
Open

Short syntax for UNLIST with IN operator#8878
ChudaykinAlex wants to merge 2 commits intoFirebirdSQL:masterfrom
ChudaykinAlex:work/short_syntax_for_UNLIST_with_IN_operator

Conversation

@ChudaykinAlex
Copy link
Contributor

Short syntax for UNLIST with IN operator

Summary

Added a simplified syntax form for using UNLIST with the IN operator.

Instead of:

SELECT * FROM EMPLOYEE WHERE EMP_NO IN (SELECT * FROM UNLIST('2,4,5,7,11') AS U)

You can now write:

SELECT * FROM EMPLOYEE WHERE EMP_NO IN UNLIST('2,4,5,7,11')

Changes

  • src/dsql/parse.y: Added support for short syntax in the parser. When using IN UNLIST(...), an internal subquery with correlation name and column name is automatically created.
  • doc/sql.extensions/README.unlist: Added documentation with usage examples for the new syntax.

Open Questions

1. Automatic RETURNING type inference

Currently, UNLIST returns VARCHAR(1024) by default. This works fine for comparison with INT/BIGINT without issues, but what if the query contains VERY_LONG_VARCHAR IN UNLIST(...)? The default type might cause an exception.

Question: Should we implement automatic RETURNING type inference based on the left operand type in comparison? If RETURNING is explicitly specified, use it; otherwise, infer from context.

2. Issue with automatic type detection

Found a problematic case:

1 IN UNLIST('9876543210')

It will try to convert the number to INT and fail with an overflow error.

Possible solutions:

  • Choose the maximum capacity type compatible with
  • Simply return VARCHAR(32K)
  • Keep it as is with VARCHAR(1024), since there's no reasonable use case for this feature with long strings, and it can always be worked around with explicit RETURNING specification

@dyemanov dyemanov linked an issue Feb 3, 2026 that may be closed by this pull request
@dyemanov
Copy link
Member

dyemanov commented Feb 3, 2026

This PR implements request #8580 which is based on the initial discussion in #8005.

@sim1984
Copy link
Contributor

sim1984 commented Feb 3, 2026

  1. Issue with automatic type detection

One of the following options would be acceptable to me:

  • Choose the maximum capacity type compatible with
  • Keep it as is with VARCHAR(1024), since there's no reasonable use case for this feature with long strings, and it can always be worked around with explicit RETURNING specification

@sim1984
Copy link
Contributor

sim1984 commented Feb 3, 2026

Considering that such a construction will be performed by the Hash semi-join algorithm or, at best, via JOIN DISTINCT UNLIST, returning wide data types is not very practical. It's better to choose an implicit precast to the field type with a correction for maximum capacity.

SELECT * FROM EMPLOYEE WHERE EMP_NO IN UNLIST('2,4,5,7,11')

transform to

SELECT * FROM EMPLOYEE 
WHERE EMP_NO IN (SELECT N FROM UNLIST('2,4,5,7,11') AS U(N));
Select Expression
    -> Filter
        -> Hash Join (semi) (keys: 1, total key length: 2)
            -> Table "PUBLIC"."EMPLOYEE" Full Scan
            -> Record Buffer (record length: 153)
                -> Function "UNLIST" as "U" Scan

vs

SELECT * FROM EMPLOYEE 
WHERE EMP_NO IN (SELECT N FROM UNLIST('2,4,5,7,11' RETURNING BIGINT) AS U(N));
Select Expression
    -> Filter
        -> Hash Join (semi) (keys: 1, total key length: 8)
            -> Table "PUBLIC"."EMPLOYEE" Full Scan
            -> Record Buffer (record length: 33)
                -> Function "UNLIST" as "U" Scan

or best choise

SELECT * FROM EMPLOYEE E
JOIN (SELECT DISTINCT N FROM UNLIST('2,4,5,7,11') AS U(N)) T ON T.N = E.EMP_NO;
Select Expression
    -> Nested Loop Join (inner)
        -> Unique Sort (record length: 156, key length: 132)
            -> Function "UNLIST" as "T" "U" Scan
        -> Filter
            -> Table "PUBLIC"."EMPLOYEE" as "E" Access By ID
                -> Bitmap
                    -> Index "PUBLIC"."RDB$PRIMARY7" Unique Scan

vs

SELECT * FROM EMPLOYEE E
JOIN (SELECT DISTINCT N FROM UNLIST('2,4,5,7,11' RETURNING BIGINT) AS U(N)) T ON T.N = E.EMP_NO;
Select Expression
    -> Nested Loop Join (inner)
        -> Unique Sort (record length: 36, key length: 12)
            -> Function "UNLIST" as "T" "U" Scan
        -> Filter
            -> Table "PUBLIC"."EMPLOYEE" as "E" Access By ID
                -> Bitmap
                    -> Index "PUBLIC"."RDB$PRIMARY7" Unique Scan

In either case, it is best to avoid wide buffer or wide external sort.

@aafemt
Copy link
Contributor

aafemt commented Feb 3, 2026

  • Simply return VARCHAR(32K)

The type of function's parameter is known at compile or at least run time, there is no point to return output string longer than input.

@dyemanov
Copy link
Member

dyemanov commented Feb 3, 2026

  • Simply return VARCHAR(32K)

The type of function's parameter is known at compile or at least run time, there is no point to return output string longer than input.

Input is a blob which may contain strings (delimited or not) of any length.

@sim1984
Copy link
Contributor

sim1984 commented Feb 3, 2026

smallint, int, bigint - bigint
Float, double - double
Decfloat(16), decfloat(34) - decfloat(34)
numeric/decimal, int128 date, time, timestamp as is
Char(n)/varchar(n) - varchar(1024)
In all cases, you can override the output type by explicitly specifying returning.

@dyemanov
Copy link
Member

dyemanov commented Feb 3, 2026

smallint, int, bigint - bigint

Why not int128? They can be stored inside a blob and compared with int, for example.

@sim1984
Copy link
Contributor

sim1984 commented Feb 3, 2026

smallint, int, bigint - bigint

Why not int128? They can be stored inside a blob and compared with int, for example.

You can cast it to int128. The main thing is not to make varchars too large by default. If it's really necessary, you can always explicitly set the type in returning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Short syntax for UNLIST with IN operator

4 participants