Skip to content

Conversation

@cmp0xff
Copy link
Contributor

@cmp0xff cmp0xff commented Dec 18, 2025

Towards #1548

  • I don't know how to make setter work so that the typings can be updated when one assigns index or columns.
  • The two overlapping overloads of __getitem__ in _LocIndexerFrame need to be fixed

Comments and suggestions are welcomed.

@cmp0xff cmp0xff force-pushed the feature/make-dataframe-generic branch from 6c3b2d5 to c8d80ef Compare December 18, 2025 09:26
Copy link
Member

@loicdiridollou loicdiridollou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I see the whole idea about this PR, also I would have expected more breakages in the code, like for transpose method it should flip the index and columns

@overload
def __new__(
cls,
data: DataFrame[IndexT0, IndexStrT0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally sure if this is gonna break a lot of things, by default DataFrame will have RangeIndex as index and columns, isn't this gonna change that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copying another DataFrame, I suppose, so nothing is changed. Will add tests later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you create a DataFrame from data without an index, RangeIndex will surely take over. That takes further overloads of __new__.

As for now I just want to show a prototype of what can happen.


if TYPE_CHECKING: # noqa: PYI002
IndexT0 = TypeVar("IndexT0", bound=Index, default=Index)
IndexStrT0 = TypeVar("IndexStrT0", bound=Index, default=Index[str])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the default be RangeIndex here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For df.columns we somehow have a preference for it being Index[str], see def columns(self) -> Index[str]: ... in the old code. That's the reason.

Copy link
Contributor Author

@cmp0xff cmp0xff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transpose method it should flip the index and columns

Can certainly do, if the whole idea is viable.

More importantly, we need to figure out if _LocIndexFrame can be fixed at all, before continuing.

Last time when I was trying to add the backend type variable to Series, too many things crashed so that I could not continue. This time it seems more controllable for now.

@overload
def __new__(
cls,
data: DataFrame[IndexT0, IndexStrT0],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copying another DataFrame, I suppose, so nothing is changed. Will add tests later.


if TYPE_CHECKING: # noqa: PYI002
IndexT0 = TypeVar("IndexT0", bound=Index, default=Index)
IndexStrT0 = TypeVar("IndexStrT0", bound=Index, default=Index[str])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For df.columns we somehow have a preference for it being Index[str], see def columns(self) -> Index[str]: ... in the old code. That's the reason.

@overload
def __new__(
cls,
data: DataFrame[IndexT0, IndexStrT0],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you create a DataFrame from data without an index, RangeIndex will surely take over. That takes further overloads of __new__.

As for now I just want to show a prototype of what can happen.

@cmp0xff cmp0xff marked this pull request as draft December 18, 2025 16:13
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Dec 18, 2025

I am not sure I see the whole idea about this PR

I'm in agreement. If this were to work, what would be the benefit?

Note that DatatFrame.columns returns Index[str] since most people use strings to name their columns. Technically, they could be any Hashable, or even a MultiIndex However, if we made columns be Index[Any] then for the majority of people, they'd have to do cast to get the actual string names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants