Type hints overhaul#352
Conversation
76a1904 to
edbdcf4
Compare
|
@evertlammerts kindly bumping this up in case you haven't seen it. |
ack! apologies for the delay. I did spend some time on it but it's a big one and not my area of expertise, so it's taking a while. it's not forgotten though! |
|
I see that tests don't pass. That's odd, since .pyi file have no impacts at all at runtime. |
b4612b2 to
645458e
Compare
evertlammerts
left a comment
There was a problem hiding this comment.
Thanks for all the work! This looks awesome, and almost ready to merge. I've finally gone through the changes as well as I could and left some comments. Can you have a look?
_duckdb-stubs/_enums.pyi
Outdated
| dict[str, ExpectedResultType] | ||
| ] # value = {'QUERY_RESULT': <ExpectedResultType.QUERY_RESULT: 0>, 'CHANGED_ROWS': <ExpectedResultType.CHANGED_ROWS: 1>, 'NOTHING': <ExpectedResultType.NOTHING: 2>} # noqa: E501 | ||
|
|
||
| class ExplainType: |
There was a problem hiding this comment.
Should this also subclass CppEnum?
There was a problem hiding this comment.
I remember not having subclassed them for a specific reason.
However now I can't remember what was the reason😅😅.
Tests I just ran showed me that I was in fact wrong and that yes, they should be subclassed. fixed
_duckdb-stubs/_enums.pyi
Outdated
|
|
||
| ExplainTypeLiteral: TypeAlias = Literal["analyze", "standard"] | ||
|
|
||
| class PythonExceptionHandling: |
There was a problem hiding this comment.
Same here? And the other enums below?
_duckdb-stubs/__init__.pyi
Outdated
| lineterminator: str | None = None, | ||
| columns: dict[str, str] | None = None, | ||
| auto_type_candidates: lst[str] | None = None, | ||
| lineterminator: CSVLineTerminator | CSVLineTerminatorLiteral | None = None, |
There was a problem hiding this comment.
For better or worse (probably the latter), lineterminator doesn't support literal strings (CSVLineTerminatorLiteral) at the moment, even though the error message of the param verification suggests otherwise:
if (!py::none().is(lineterminator)) {
PythonCSVLineTerminator::Type new_line_type;
if (!py::try_cast<PythonCSVLineTerminator::Type>(lineterminator, new_line_type)) {
string actual_type = py::str(py::type::of(lineterminator));
throw BinderException("read_csv only accepts 'lineterminator' as a string or CSVLineTerminator, not '%s'",
actual_type);
}
bind_parameters["new_line"] = Value(PythonCSVLineTerminator::ToString(new_line_type));
}Same for all read_csv
There was a problem hiding this comment.
Deleted this Literal in consequence
| function: Callable[..., typing.Any], | ||
| parameters: lst[sqltypes.DuckDBPyType] | None = None, | ||
| return_type: sqltypes.DuckDBPyType | None = None, | ||
| function: Callable[..., PythonLiteral], |
There was a problem hiding this comment.
if type == ARROW then the return value is pyarrow.Table | pyarrow.Array | pyarrow.ChunkedArray, so PythonLiteral doesn't cut it here.
There was a problem hiding this comment.
added an overload to handle this
_duckdb-stubs/__init__.pyi
Outdated
| ) -> DuckDBPyRelation: ... | ||
| def map( | ||
| self, map_function: Callable[..., typing.Any], *, schema: dict[str, sqltypes.DuckDBPyType] | None = None | ||
| self, map_function: Callable[..., PythonLiteral], *, schema: dict[str, sqltypes.DuckDBPyType] | None = None |
There was a problem hiding this comment.
Same problem as with create_function: https://github.com/duckdb/duckdb-python/pull/352/changes#r3009834622
There was a problem hiding this comment.
hmm, since there's no way to make an overload here, I reverted it to Any.
If I used an union this would cause errors for anyone trying to use this without pyarrow installed.
I'll make a pyarrow protocol in a subsequent PR to handle this more precisely I think
| function: Callable[..., typing.Any], | ||
| parameters: lst[sqltypes.DuckDBPyType] | None = None, | ||
| return_type: sqltypes.DuckDBPyType | None = None, | ||
| function: Callable[..., PythonLiteral], |
There was a problem hiding this comment.
Same problem as with Connection.create_function: https://github.com/duckdb/duckdb-python/pull/352/changes#r3009834622
_duckdb-stubs/_enums.pyi
Outdated
| dict[str, ExplainType] | ||
| ] # value = {'STANDARD': <ExplainType.STANDARD: 0>, 'ANALYZE': <ExplainType.ANALYZE: 1>} | ||
|
|
||
| ExplainTypeLiteral: TypeAlias = Literal["analyze", "standard"] |
There was a problem hiding this comment.
Is there a way to make a Literal be case insensitive? (Same for RenderModeLiteral)
There was a problem hiding this comment.
Unfortunately no. A PEP is in progress to allow EnumLiterals types for example, but ATM Literal are a bit limited (but still VERY convenient for a library and type safety).
I added the uppercase versions to both mentionned literals instead
_duckdb-stubs/_typing.pyi
Outdated
| """Types accepted for the `field_ids` parameter in parquet writing methods.""" | ||
|
|
||
| CsvEncoding: TypeAlias = Literal["utf-8", "utf-16", "latin-1"] | str | ||
| """Encdoding options. |
|
FYI, I think that I adressed all your observations. LMK if there's anything more that need to be done |
…d allow circular imports between files. - added nested dtypes, bytesarray, and memoryview as literal, convertible python types - PythonLiteral is a recursive type, to allow dict of list, list of list, etc...
- _ExpressionLike -> IntoExpr - Expression | str -> IntoExprColumn
…mpy ndarray without creating unknown type errors if the library isn't installed in the venv
…als, not list of Any element
- Using IntoExprColumn on StarExpression - fixed lhs type for LambdaExpression, and value type for ConstantExpression
- fixed all places where it was too narrow. Most of the time str are accepted for sqltypes. odd exception seems to be the map method on Relation - using Self for annotations on arguments when pertinent
…_function and Relation.map
- Fixed StatementType members, they had incorrect values. the "_STATEMENT" part was only on the C++ side, not on the python side - Moved all enums in __init__ file in a new _enums.pyi file, to avoid bloating the init file - Created a new CppEnum Protocol, and used it as a base class for all public enums to reduce duplication. - Created literals type and using them as argument in conjunction of the corresponding enum whenever pertinent
05b5bcf to
a8d3546
Compare
Oh yea right those typos. I Guess that's the proof of human work nowadays😅😂 Curious to know if there's a reason why PythonUDF and the Error class specifically don't herit from CPPEnum? |
@OutSquareCapital I tried that but got mypy errors: Claude told me that mypy doesn't allow me to subclass a protocol imported from another module... It's noot great this way so if you know of a better way then go ahead! (Just make sure to pull the changes first, I have force pushed your branch.) Do you want me to push as is or do you want to fix this? |
Hmm I suspected that Claude was wrong here because it would be very weird that typing has anything to do with imports, and it is indeed wrong. Quick testfolder structuref.py filefrom typing import Protocol
class FooProtocol(Protocol):
def foo(self) -> str: ...t.py filefrom f import FooProtocol
class Foo(FooProtocol):
def foo(self) -> str:
return "foo"Running mypy on itPS C:\Users\tibo\python_codes\pql> uvx mypy foo
Success: no issues found in 2 source filesHowever mypy isn't a great type checker(see the number of false positives!), so weird, hardly explainable things like that can happen. |
|
I think I found the issue: you don't import the CPPEnum! |
|
You're totally right. The issue was with pre-commit. It runs only on changed files and didn't have the _enums.pyi context. I've changed |
|
Landed! Thanks @OutSquareCapital ! |
This PR provides numerous improvements and one bugfix regarding type hints.
This is my follow-up to our discussion here @evertlammerts:
#341 (comment)
All changes are only in the type stubs, which means that there's no impact whatsoever on any runtime logic.
Changes
_expression.pyifile to separate theExpressionclass, and allow circular imports and references. Leans-up a bit the__init__file, which is nice.Protocolfor numpy array and types. Allow to type check those without emitting errors if the user doesn't have the library installed. Array is useful forExpressionconversions, Dtype forDuckDBPyTypeconversions._ExpressionLiketype alias. Renamed it toIntoExpr, and added various new type aliases covering as much situations as possible forExpressionconversions.strconversions toDuckDBPyType, providing a nice autocompletion for arguments, and a nice interaction with pattern matching when checking the id value.Also, provide
JSONandBIGNUMconvenient instanciation as an added bonus (ATM they are absent from sqltypes constants).DuckDBPyTypeconversions: as python/numpy static type hints, asdictinstances, or asLiteral | str. This significantly improve the types hints regarding datatypes arguments, who were very often only accepting str orDuckDBTypesin the signatures.Literalfor files methods/functions argument options._typing.pyifile, to avoid bloating the__init__.CppEnumclass to reduce code duplication for enum-like classes, and centralized them in a new_enum.pyifile.StatementTypeclass who had incorrect values (no_STATEMENTat the end of the member names)Notes
I tried to document this as best as I could with docstrings for users and "private" comments.
I left a few observations, but what I would add is that one thing is clear, the runtime accepted types are all over the place (sometimes Mapping is ok, sometimes only dict is ok, etc...).
As I said in Typing stubs are too strict about arguments of type
Expression#341 , prioritizingcollections.abcas much as possible would be the best way to go in the future.Centralizing the type aliases and using them as much as possible make sense IMO, especially with an API that have repeated signatures (connexion methods vs module level function for example).
The next step would be to move the type definition in a concrete .py file, allowing user to import those if they want to annotate custom functions or do runtime type introspection. Note that this can be done if you import them from
_duckdb(not intuitive), but only in aTYPE_CHECKINGblock (will crash otherwise), and your LSP will most likely say that they can't resolve the import.