Skip to content

Type hints overhaul#352

Merged
evertlammerts merged 24 commits intoduckdb:v1.5-variegatafrom
OutSquareCapital:expr-typing
Mar 31, 2026
Merged

Type hints overhaul#352
evertlammerts merged 24 commits intoduckdb:v1.5-variegatafrom
OutSquareCapital:expr-typing

Conversation

@OutSquareCapital
Copy link
Copy Markdown
Contributor

@OutSquareCapital OutSquareCapital commented Feb 27, 2026

This PR provides numerous improvements and one bugfix regarding type hints.

This is my follow-up to our discussion here @evertlammerts:
#341 (comment)

All changes are only in the type stubs, which means that there's no impact whatsoever on any runtime logic.

Changes

  1. New _expression.pyi file to separate the Expression class, and allow circular imports and references. Leans-up a bit the __init__ file, which is nice.
  2. Two new Protocol for numpy array and types. Allow to type check those without emitting errors if the user doesn't have the library installed. Array is useful for Expression conversions, Dtype for DuckDBPyType conversions.
  3. Refactored and expanded the _ExpressionLike type alias. Renamed it to IntoExpr, and added various new type aliases covering as much situations as possible for Expression conversions.
  4. Added a few Literals to cover the ids and str conversions to DuckDBPyType, providing a nice autocompletion for arguments, and a nice interaction with pattern matching when checking the id value.
    Also, provide JSON and BIGNUM convenient instanciation as an added bonus (ATM they are absent from sqltypes constants).
  5. Added various new type aliases, to cover all paths for DuckDBPyType conversions: as python/numpy static type hints, as dict instances, or as Literal | str. This significantly improve the types hints regarding datatypes arguments, who were very often only accepting str or DuckDBTypes in the signatures.
  6. Added various new Literal for files methods/functions argument options.
  7. Centralized type aliases, Literals, and Protocols in a _typing.pyi file, to avoid bloating the __init__.
  8. Added a new CppEnum class to reduce code duplication for enum-like classes, and centralized them in a new _enum.pyi file.
  9. Fixed the StatementType class who had incorrect values (no _STATEMENT at the end of the member names)

Notes

  • I tried to document this as best as I could with docstrings for users and "private" comments.
    I left a few observations, but what I would add is that one thing is clear, the runtime accepted types are all over the place (sometimes Mapping is ok, sometimes only dict is ok, etc...).
    As I said in Typing stubs are too strict about arguments of type Expression #341 , prioritizing collections.abc as much as possible would be the best way to go in the future.

  • Centralizing the type aliases and using them as much as possible make sense IMO, especially with an API that have repeated signatures (connexion methods vs module level function for example).

  • The next step would be to move the type definition in a concrete .py file, allowing user to import those if they want to annotate custom functions or do runtime type introspection. Note that this can be done if you import them from _duckdb (not intuitive), but only in a TYPE_CHECKING block (will crash otherwise), and your LSP will most likely say that they can't resolve the import.

@OutSquareCapital
Copy link
Copy Markdown
Contributor Author

@evertlammerts kindly bumping this up in case you haven't seen it.

@evertlammerts
Copy link
Copy Markdown
Collaborator

@evertlammerts kindly bumping this up in case you haven't seen it.

ack! apologies for the delay. I did spend some time on it but it's a big one and not my area of expertise, so it's taking a while. it's not forgotten though!

@OutSquareCapital
Copy link
Copy Markdown
Contributor Author

I see that tests don't pass. That's odd, since .pyi file have no impacts at all at runtime.
Let me know if there's anything that need to be done.

@evertlammerts evertlammerts force-pushed the expr-typing branch 2 times, most recently from b4612b2 to 645458e Compare March 26, 2026 12:09
Copy link
Copy Markdown
Collaborator

@evertlammerts evertlammerts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work! This looks awesome, and almost ready to merge. I've finally gone through the changes as well as I could and left some comments. Can you have a look?

dict[str, ExpectedResultType]
] # value = {'QUERY_RESULT': <ExpectedResultType.QUERY_RESULT: 0>, 'CHANGED_ROWS': <ExpectedResultType.CHANGED_ROWS: 1>, 'NOTHING': <ExpectedResultType.NOTHING: 2>} # noqa: E501

class ExplainType:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also subclass CppEnum?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember not having subclassed them for a specific reason.
However now I can't remember what was the reason😅😅.
Tests I just ran showed me that I was in fact wrong and that yes, they should be subclassed. fixed


ExplainTypeLiteral: TypeAlias = Literal["analyze", "standard"]

class PythonExceptionHandling:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here? And the other enums below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

lineterminator: str | None = None,
columns: dict[str, str] | None = None,
auto_type_candidates: lst[str] | None = None,
lineterminator: CSVLineTerminator | CSVLineTerminatorLiteral | None = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better or worse (probably the latter), lineterminator doesn't support literal strings (CSVLineTerminatorLiteral) at the moment, even though the error message of the param verification suggests otherwise:

	if (!py::none().is(lineterminator)) {
		PythonCSVLineTerminator::Type new_line_type;
		if (!py::try_cast<PythonCSVLineTerminator::Type>(lineterminator, new_line_type)) {
			string actual_type = py::str(py::type::of(lineterminator));
			throw BinderException("read_csv only accepts 'lineterminator' as a string or CSVLineTerminator, not '%s'",
			                      actual_type);
		}
		bind_parameters["new_line"] = Value(PythonCSVLineTerminator::ToString(new_line_type));
	}

Same for all read_csv

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted this Literal in consequence

function: Callable[..., typing.Any],
parameters: lst[sqltypes.DuckDBPyType] | None = None,
return_type: sqltypes.DuckDBPyType | None = None,
function: Callable[..., PythonLiteral],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if type == ARROW then the return value is pyarrow.Table | pyarrow.Array | pyarrow.ChunkedArray, so PythonLiteral doesn't cut it here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an overload to handle this

) -> DuckDBPyRelation: ...
def map(
self, map_function: Callable[..., typing.Any], *, schema: dict[str, sqltypes.DuckDBPyType] | None = None
self, map_function: Callable[..., PythonLiteral], *, schema: dict[str, sqltypes.DuckDBPyType] | None = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

@OutSquareCapital OutSquareCapital Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, since there's no way to make an overload here, I reverted it to Any.
If I used an union this would cause errors for anyone trying to use this without pyarrow installed.
I'll make a pyarrow protocol in a subsequent PR to handle this more precisely I think

function: Callable[..., typing.Any],
parameters: lst[sqltypes.DuckDBPyType] | None = None,
return_type: sqltypes.DuckDBPyType | None = None,
function: Callable[..., PythonLiteral],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem as with Connection.create_function: https://github.com/duckdb/duckdb-python/pull/352/changes#r3009834622

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

dict[str, ExplainType]
] # value = {'STANDARD': <ExplainType.STANDARD: 0>, 'ANALYZE': <ExplainType.ANALYZE: 1>}

ExplainTypeLiteral: TypeAlias = Literal["analyze", "standard"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make a Literal be case insensitive? (Same for RenderModeLiteral)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no. A PEP is in progress to allow EnumLiterals types for example, but ATM Literal are a bit limited (but still VERY convenient for a library and type safety).
I added the uppercase versions to both mentionned literals instead

"""Types accepted for the `field_ids` parameter in parquet writing methods."""

CsvEncoding: TypeAlias = Literal["utf-8", "utf-16", "latin-1"] | str
"""Encdoding options.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@OutSquareCapital
Copy link
Copy Markdown
Contributor Author

FYI, I think that I adressed all your observations. LMK if there's anything more that need to be done

Copy link
Copy Markdown
Collaborator

@evertlammerts evertlammerts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'll push one more commit to fix some typos and then I'll merge

…d allow circular imports between files.

- added  nested dtypes, bytesarray, and memoryview as literal, convertible python types
- PythonLiteral is a recursive type, to allow dict of list, list of list, etc...
- _ExpressionLike -> IntoExpr
- Expression | str -> IntoExprColumn
…mpy ndarray without creating unknown type errors if the library isn't installed in the venv
- Using IntoExprColumn on StarExpression
- fixed lhs type for LambdaExpression, and value type for ConstantExpression
- fixed all places where it was too narrow. Most of the time str are accepted for sqltypes. odd exception seems to be the map method on Relation
- using Self for annotations on arguments when pertinent
- reorganized expressions/values conversions types, improved their doc
- added Literals for sqltypes ids and string conversion, and various type aliases, covering all paths.
- using aformentionned literals in _sqltypes signatures
- added various new literals for files arguments
- moved join "how" literal in _typing for centralization
- renamed IntoNestedDType -> IntoFields
- added all new literals and type aliases in the main init file
- Builtins Literal had incorrect values for time/timestamp with time zone
- typos fixes
- renamed `DType` for Literals to `PyType` to keep the naming conventions consistent
- Fixed StatementType members, they had incorrect values. the "_STATEMENT" part was only on the C++ side, not on the python side
- Moved all enums in __init__ file in a new _enums.pyi file, to avoid bloating the init file
- Created a new CppEnum Protocol, and used it as a base class for all public enums to reduce duplication.
- Created literals type and using them as argument in conjunction of the corresponding enum whenever pertinent
@OutSquareCapital
Copy link
Copy Markdown
Contributor Author

OutSquareCapital commented Mar 31, 2026

Looks great! I'll push one more commit to fix some typos and then I'll merge

Oh yea right those typos. I Guess that's the proof of human work nowadays😅😂

Curious to know if there's a reason why PythonUDF and the Error class specifically don't herit from CPPEnum?

@evertlammerts
Copy link
Copy Markdown
Collaborator

evertlammerts commented Mar 31, 2026

Curious to know if there's a reason why PythonUDF and the Error class specifically don't herit from CPPEnum?

@OutSquareCapital I tried that but got mypy errors:

_duckdb-stubs/_func.pyi:6: error: Class cannot subclass "CppEnum" (has type
"Any")  [misc]
    class FunctionNullHandling(CppEnum):
                               ^~~~~~~
_duckdb-stubs/_func.pyi:13: error: Class cannot subclass "CppEnum" (has type
"Any")  [misc]
    class PythonUDFType(CppEnum):
                        ^~~~~~~

Claude told me that mypy doesn't allow me to subclass a protocol imported from another module... It's noot great this way so if you know of a better way then go ahead! (Just make sure to pull the changes first, I have force pushed your branch.)

Do you want me to push as is or do you want to fix this?

@OutSquareCapital
Copy link
Copy Markdown
Contributor Author

OutSquareCapital commented Mar 31, 2026

Curious to know if there's a reason why PythonUDF and the Error class specifically don't herit from CPPEnum?

@OutSquareCapital I tried that but got mypy errors:

_duckdb-stubs/_func.pyi:6: error: Class cannot subclass "CppEnum" (has type
"Any")  [misc]
    class FunctionNullHandling(CppEnum):
                               ^~~~~~~
_duckdb-stubs/_func.pyi:13: error: Class cannot subclass "CppEnum" (has type
"Any")  [misc]
    class PythonUDFType(CppEnum):
                        ^~~~~~~

Claude told me that mypy doesn't allow me to subclass a protocol imported from another module... It's noot great this way so if you know of a better way then go ahead! (Just make sure to pull the changes first, I have force pushed your branch.)

Do you want me to push as is or do you want to fix this?

Hmm I suspected that Claude was wrong here because it would be very weird that typing has anything to do with imports, and it is indeed wrong.

Quick test

folder structure

foo
foo\f.py
foo\t.py

f.py file

from typing import Protocol


class FooProtocol(Protocol):
    def foo(self) -> str: ...

t.py file

from f import FooProtocol


class Foo(FooProtocol):
    def foo(self) -> str:
        return "foo"

Running mypy on it

PS C:\Users\tibo\python_codes\pql> uvx mypy foo 
Success: no issues found in 2 source files

However mypy isn't a great type checker(see the number of false positives!), so weird, hardly explainable things like that can happen.
But I mean, if it works, it works.
At the end of the day this was just a way to reduce internal code duplication, so to me it's OK.

@OutSquareCapital
Copy link
Copy Markdown
Contributor Author

I think I found the issue: you don't import the CPPEnum!

@evertlammerts
Copy link
Copy Markdown
Collaborator

evertlammerts commented Mar 31, 2026

You're totally right. The issue was with pre-commit. It runs only on changed files and didn't have the _enums.pyi context.

I've changed FunctionNullHandling and PythonUDFType, then did a manual verify with mypy, it all seems to work now!

@evertlammerts evertlammerts merged commit 5f56d32 into duckdb:v1.5-variegata Mar 31, 2026
15 checks passed
@evertlammerts
Copy link
Copy Markdown
Collaborator

Landed! Thanks @OutSquareCapital !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants