Type hints overhaul by OutSquareCapital · Pull Request #352 · duckdb/duckdb-python

OutSquareCapital · 2026-02-27T15:27:02Z

This PR provides numerous improvements and one bugfix regarding type hints.

This is my follow-up to our discussion here @evertlammerts:
#341 (comment)

All changes are only in the type stubs, which means that there's no impact whatsoever on any runtime logic.

Changes

New _expression.pyi file to separate the Expression class, and allow circular imports and references. Leans-up a bit the __init__ file, which is nice.
Two new Protocol for numpy array and types. Allow to type check those without emitting errors if the user doesn't have the library installed. Array is useful for Expression conversions, Dtype for DuckDBPyType conversions.
Refactored and expanded the _ExpressionLike type alias. Renamed it to IntoExpr, and added various new type aliases covering as much situations as possible for Expression conversions.
Added a few Literals to cover the ids and str conversions to DuckDBPyType, providing a nice autocompletion for arguments, and a nice interaction with pattern matching when checking the id value.
Also, provide JSON and BIGNUM convenient instanciation as an added bonus (ATM they are absent from sqltypes constants).
Added various new type aliases, to cover all paths for DuckDBPyType conversions: as python/numpy static type hints, as dict instances, or as Literal | str. This significantly improve the types hints regarding datatypes arguments, who were very often only accepting str or DuckDBTypes in the signatures.
Added various new Literal for files methods/functions argument options.
Centralized type aliases, Literals, and Protocols in a _typing.pyi file, to avoid bloating the __init__.
Added a new CppEnum class to reduce code duplication for enum-like classes, and centralized them in a new _enum.pyi file.
Fixed the StatementType class who had incorrect values (no _STATEMENT at the end of the member names)

Notes

I tried to document this as best as I could with docstrings for users and "private" comments.
I left a few observations, but what I would add is that one thing is clear, the runtime accepted types are all over the place (sometimes Mapping is ok, sometimes only dict is ok, etc...).
As I said in Typing stubs are too strict about arguments of type Expression #341 , prioritizing collections.abc as much as possible would be the best way to go in the future.
Centralizing the type aliases and using them as much as possible make sense IMO, especially with an API that have repeated signatures (connexion methods vs module level function for example).
The next step would be to move the type definition in a concrete .py file, allowing user to import those if they want to annotate custom functions or do runtime type introspection. Note that this can be done if you import them from _duckdb (not intuitive), but only in a TYPE_CHECKING block (will crash otherwise), and your LSP will most likely say that they can't resolve the import.

OutSquareCapital · 2026-03-18T10:38:26Z

@evertlammerts kindly bumping this up in case you haven't seen it.

evertlammerts · 2026-03-18T13:51:14Z

@evertlammerts kindly bumping this up in case you haven't seen it.

ack! apologies for the delay. I did spend some time on it but it's a big one and not my area of expertise, so it's taking a while. it's not forgotten though!

OutSquareCapital · 2026-03-20T10:13:40Z

I see that tests don't pass. That's odd, since .pyi file have no impacts at all at runtime.
Let me know if there's anything that need to be done.

evertlammerts

Thanks for all the work! This looks awesome, and almost ready to merge. I've finally gone through the changes as well as I could and left some comments. Can you have a look?

evertlammerts · 2026-03-05T16:27:19Z

_duckdb-stubs/_enums.pyi

+        dict[str, ExpectedResultType]
+    ]  # value = {'QUERY_RESULT': <ExpectedResultType.QUERY_RESULT: 0>, 'CHANGED_ROWS': <ExpectedResultType.CHANGED_ROWS: 1>, 'NOTHING': <ExpectedResultType.NOTHING: 2>}  # noqa: E501
+
+class ExplainType:


Should this also subclass CppEnum?

I remember not having subclassed them for a specific reason.
However now I can't remember what was the reason😅😅.
Tests I just ran showed me that I was in fact wrong and that yes, they should be subclassed. fixed

evertlammerts · 2026-03-05T16:27:46Z

_duckdb-stubs/_enums.pyi

+
+ExplainTypeLiteral: TypeAlias = Literal["analyze", "standard"]
+
+class PythonExceptionHandling:


Same here? And the other enums below?

evertlammerts · 2026-03-27T07:38:26Z

_duckdb-stubs/__init__.pyi

-    lineterminator: str | None = None,
-    columns: dict[str, str] | None = None,
-    auto_type_candidates: lst[str] | None = None,
+    lineterminator: CSVLineTerminator | CSVLineTerminatorLiteral | None = None,


For better or worse (probably the latter), lineterminator doesn't support literal strings (CSVLineTerminatorLiteral) at the moment, even though the error message of the param verification suggests otherwise:

if (!py::none().is(lineterminator)) { PythonCSVLineTerminator::Type new_line_type; if (!py::try_cast<PythonCSVLineTerminator::Type>(lineterminator, new_line_type)) { string actual_type = py::str(py::type::of(lineterminator)); throw BinderException("read_csv only accepts 'lineterminator' as a string or CSVLineTerminator, not '%s'", actual_type); } bind_parameters["new_line"] = Value(PythonCSVLineTerminator::ToString(new_line_type)); }

Same for all read_csv

Deleted this Literal in consequence

evertlammerts · 2026-03-30T13:33:54Z

_duckdb-stubs/__init__.pyi

-        function: Callable[..., typing.Any],
-        parameters: lst[sqltypes.DuckDBPyType] | None = None,
-        return_type: sqltypes.DuckDBPyType | None = None,
+        function: Callable[..., PythonLiteral],


if type == ARROW then the return value is pyarrow.Table | pyarrow.Array | pyarrow.ChunkedArray, so PythonLiteral doesn't cut it here.

added an overload to handle this

evertlammerts · 2026-03-30T13:34:33Z

_duckdb-stubs/__init__.pyi

    ) -> DuckDBPyRelation: ...
    def map(
-        self, map_function: Callable[..., typing.Any], *, schema: dict[str, sqltypes.DuckDBPyType] | None = None
+        self, map_function: Callable[..., PythonLiteral], *, schema: dict[str, sqltypes.DuckDBPyType] | None = None


Same problem as with create_function: https://github.com/duckdb/duckdb-python/pull/352/changes#r3009834622

hmm, since there's no way to make an overload here, I reverted it to Any.
If I used an union this would cause errors for anyone trying to use this without pyarrow installed.
I'll make a pyarrow protocol in a subsequent PR to handle this more precisely I think

evertlammerts · 2026-03-30T13:35:12Z

_duckdb-stubs/__init__.pyi

-    function: Callable[..., typing.Any],
-    parameters: lst[sqltypes.DuckDBPyType] | None = None,
-    return_type: sqltypes.DuckDBPyType | None = None,
+    function: Callable[..., PythonLiteral],


Same problem as with Connection.create_function: https://github.com/duckdb/duckdb-python/pull/352/changes#r3009834622

evertlammerts · 2026-03-30T14:24:56Z

_duckdb-stubs/_enums.pyi

+        dict[str, ExplainType]
+    ]  # value = {'STANDARD': <ExplainType.STANDARD: 0>, 'ANALYZE': <ExplainType.ANALYZE: 1>}
+
+ExplainTypeLiteral: TypeAlias = Literal["analyze", "standard"]


Is there a way to make a Literal be case insensitive? (Same for RenderModeLiteral)

Unfortunately no. A PEP is in progress to allow EnumLiterals types for example, but ATM Literal are a bit limited (but still VERY convenient for a library and type safety).
I added the uppercase versions to both mentionned literals instead

evertlammerts · 2026-03-30T14:37:27Z

_duckdb-stubs/_typing.pyi

+"""Types accepted for the `field_ids` parameter in parquet writing methods."""
+
+CsvEncoding: TypeAlias = Literal["utf-8", "utf-16", "latin-1"] | str
+"""Encdoding options.


OutSquareCapital · 2026-03-30T20:26:38Z

FYI, I think that I adressed all your observations. LMK if there's anything more that need to be done

evertlammerts

Looks great! I'll push one more commit to fix some typos and then I'll merge

…d allow circular imports between files. - added nested dtypes, bytesarray, and memoryview as literal, convertible python types - PythonLiteral is a recursive type, to allow dict of list, list of list, etc...

- _ExpressionLike -> IntoExpr - Expression | str -> IntoExprColumn

…mpy ndarray without creating unknown type errors if the library isn't installed in the venv

…als, not list of Any element

- Using IntoExprColumn on StarExpression - fixed lhs type for LambdaExpression, and value type for ConstantExpression

- fixed all places where it was too narrow. Most of the time str are accepted for sqltypes. odd exception seems to be the map method on Relation - using Self for annotations on arguments when pertinent

…_function and Relation.map

- reorganized expressions/values conversions types, improved their doc - added Literals for sqltypes ids and string conversion, and various type aliases, covering all paths. - using aformentionned literals in _sqltypes signatures

- added various new literals for files arguments - moved join "how" literal in _typing for centralization - renamed IntoNestedDType -> IntoFields

- added all new literals and type aliases in the main init file

- Builtins Literal had incorrect values for time/timestamp with time zone - typos fixes - renamed `DType` for Literals to `PyType` to keep the naming conventions consistent

- Fixed StatementType members, they had incorrect values. the "_STATEMENT" part was only on the C++ side, not on the python side - Moved all enums in __init__ file in a new _enums.pyi file, to avoid bloating the init file - Created a new CppEnum Protocol, and used it as a base class for all public enums to reduce duplication. - Created literals type and using them as argument in conjunction of the corresponding enum whenever pertinent

OutSquareCapital · 2026-03-31T08:08:22Z

Looks great! I'll push one more commit to fix some typos and then I'll merge

Oh yea right those typos. I Guess that's the proof of human work nowadays😅😂

Curious to know if there's a reason why PythonUDF and the Error class specifically don't herit from CPPEnum?

evertlammerts · 2026-03-31T08:40:14Z

Curious to know if there's a reason why PythonUDF and the Error class specifically don't herit from CPPEnum?

@OutSquareCapital I tried that but got mypy errors:

_duckdb-stubs/_func.pyi:6: error: Class cannot subclass "CppEnum" (has type
"Any")  [misc]
    class FunctionNullHandling(CppEnum):
                               ^~~~~~~
_duckdb-stubs/_func.pyi:13: error: Class cannot subclass "CppEnum" (has type
"Any")  [misc]
    class PythonUDFType(CppEnum):
                        ^~~~~~~

Claude told me that mypy doesn't allow me to subclass a protocol imported from another module... It's noot great this way so if you know of a better way then go ahead! (Just make sure to pull the changes first, I have force pushed your branch.)

Do you want me to push as is or do you want to fix this?

OutSquareCapital · 2026-03-31T09:46:34Z

Curious to know if there's a reason why PythonUDF and the Error class specifically don't herit from CPPEnum?

@OutSquareCapital I tried that but got mypy errors:
_duckdb-stubs/_func.pyi:6: error: Class cannot subclass "CppEnum" (has type
"Any")  [misc]
    class FunctionNullHandling(CppEnum):
                               ^~~~~~~
_duckdb-stubs/_func.pyi:13: error: Class cannot subclass "CppEnum" (has type
"Any")  [misc]
    class PythonUDFType(CppEnum):
                        ^~~~~~~
Claude told me that mypy doesn't allow me to subclass a protocol imported from another module... It's noot great this way so if you know of a better way then go ahead! (Just make sure to pull the changes first, I have force pushed your branch.)

Do you want me to push as is or do you want to fix this?

Hmm I suspected that Claude was wrong here because it would be very weird that typing has anything to do with imports, and it is indeed wrong.

Quick test

folder structure

foo
foo\f.py
foo\t.py

f.py file

from typing import Protocol


class FooProtocol(Protocol):
    def foo(self) -> str: ...

t.py file

from f import FooProtocol


class Foo(FooProtocol):
    def foo(self) -> str:
        return "foo"

Running mypy on it

PS C:\Users\tibo\python_codes\pql> uvx mypy foo 
Success: no issues found in 2 source files

However mypy isn't a great type checker(see the number of false positives!), so weird, hardly explainable things like that can happen.
But I mean, if it works, it works.
At the end of the day this was just a way to reduce internal code duplication, so to me it's OK.

OutSquareCapital · 2026-03-31T10:38:38Z

I think I found the issue: you don't import the CPPEnum!

evertlammerts · 2026-03-31T10:41:01Z

You're totally right. The issue was with pre-commit. It runs only on changed files and didn't have the _enums.pyi context.

I've changed FunctionNullHandling and PythonUDFType, then did a manual verify with mypy, it all seems to work now!

evertlammerts · 2026-03-31T12:30:58Z

Landed! Thanks @OutSquareCapital !

OutSquareCapital force-pushed the expr-typing branch from 76a1904 to edbdcf4 Compare March 1, 2026 12:11

OutSquareCapital mentioned this pull request Mar 12, 2026

comparison to ibis? consider pep 0827 support? OutSquareCapital/pql#1

Open

evertlammerts force-pushed the expr-typing branch 2 times, most recently from b4612b2 to 645458e Compare March 26, 2026 12:09

evertlammerts requested changes Mar 30, 2026

View reviewed changes

evertlammerts reviewed Mar 31, 2026

View reviewed changes

OutSquareCapital added 20 commits March 31, 2026 09:08

- new _typing and _expression stub file to centralize type aliases an…

40c13f3

…d allow circular imports between files. - added nested dtypes, bytesarray, and memoryview as literal, convertible python types - PythonLiteral is a recursive type, to allow dict of list, list of list, etc...

refactor of init in consequence of last commit:

bb52afd

- _ExpressionLike -> IntoExpr - Expression | str -> IntoExprColumn

added Numpy Array protocol to accepted literal types. allow to add nu…

4248e40

…mpy ndarray without creating unknown type errors if the library isn't installed in the venv

sync lst builtin fix with 3.10 branch

052a970

fix: dict keys can't be nested literals

8ba8348

Relation.update set argument can accept a mapping

005ecbf

values function and Connection method can accept list of Python Liter…

e44c4a2

…als, not list of Any element

- added IntoValues type alias

841521d

- Using IntoExprColumn on StarExpression - fixed lhs type for LambdaExpression, and value type for ConstantExpression

refactor of datatypes typing:

cafc7da

- fixed all places where it was too narrow. Most of the time str are accepted for sqltypes. odd exception seems to be the map method on Relation - using Self for annotations on arguments when pertinent

using PythonLiteral types in place of typing.Any for Connexion.create…

6a9e210

…_function and Relation.map

numpy protocols improvements

067998d

refactor:

c693362

- reorganized expressions/values conversions types, improved their doc - added Literals for sqltypes ids and string conversion, and various type aliases, covering all paths. - using aformentionned literals in _sqltypes signatures

feat:

c4919a3

- added various new literals for files arguments - moved join "how" literal in _typing for centralization - renamed IntoNestedDType -> IntoFields

feat:

ab9d709

- added all new literals and type aliases in the main init file

fixs:

61881d5

- Builtins Literal had incorrect values for time/timestamp with time zone - typos fixes - renamed `DType` for Literals to `PyType` to keep the naming conventions consistent

lint fixes + fixed missing TypeAlias markers

6d29043

Typing : added ParquetCompression and ProfilerFormat literals

725ef04

raw NPArray can be used for value conversions

97fb2b4

fix: time_ns and variant dtypes were missing from builtins literal

85df304

OutSquareCapital and others added 2 commits March 31, 2026 09:08

fixed various issues after PR review

4513935

last fixes

a8d3546

evertlammerts force-pushed the expr-typing branch from 05b5bcf to a8d3546 Compare March 31, 2026 07:16

evertlammerts approved these changes Mar 31, 2026

View reviewed changes

use CppEnum

399ede4

use CppEnum and remove the protocol functions from subclasses

8a94680

evertlammerts merged commit 5f56d32 into duckdb:v1.5-variegata Mar 31, 2026
15 checks passed


		ExplainTypeLiteral: TypeAlias = Literal["analyze", "standard"]

		class PythonExceptionHandling:

Conversation

OutSquareCapital commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Notes

Uh oh!

OutSquareCapital commented Mar 18, 2026

Uh oh!

evertlammerts commented Mar 18, 2026

Uh oh!

OutSquareCapital commented Mar 20, 2026

Uh oh!

evertlammerts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OutSquareCapital Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OutSquareCapital commented Mar 30, 2026

Uh oh!

evertlammerts left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OutSquareCapital commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evertlammerts commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OutSquareCapital commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Quick test

folder structure

f.py file

t.py file

Running mypy on it

Uh oh!

OutSquareCapital commented Mar 31, 2026

Uh oh!

evertlammerts commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

evertlammerts commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

OutSquareCapital commented Feb 27, 2026 •

edited

Loading

OutSquareCapital Mar 30, 2026 •

edited

Loading

evertlammerts left a comment •

edited

Loading

OutSquareCapital commented Mar 31, 2026 •

edited

Loading

evertlammerts commented Mar 31, 2026 •

edited

Loading

OutSquareCapital commented Mar 31, 2026 •

edited

Loading

evertlammerts commented Mar 31, 2026 •

edited

Loading