Skip to content

[Feature Request] Add support for clipping values during type enforcement #273

@svengiegerich

Description

@svengiegerich

Is your feature request related to a problem? Please describe.

Currently, when enforcing a schema where data exceeds the range of the target dtype (e.g., a value of 260 for a uint8 column), the operation may fail or lead to silent overflows. In production environments, it is often preferable to bound these values rather than allowing the pipeline to crash or produce corrupted data.

Describe the solution you'd like

I would like to see an option—perhaps a parameter like out_of_bounds="clip"—within the schema enforcement logic. When enabled, any value exceeding the maximum or minimum of the target numeric type would be clipped to that type's limit.

Example Scenario

Target Dtype: uint8 (Range: 0 to 255)

Input Value: 260

Expected Result (with clipping): 255

Describe alternatives you've considered

The current alternative is to manually call .clip() on the DataFrame before validation, but this duplicates logic that could be handled more efficiently during the schema enforcement/casting phase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions