-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Is your feature request related to a problem? Please describe.
Currently, when enforcing a schema where data exceeds the range of the target dtype (e.g., a value of 260 for a uint8 column), the operation may fail or lead to silent overflows. In production environments, it is often preferable to bound these values rather than allowing the pipeline to crash or produce corrupted data.
Describe the solution you'd like
I would like to see an option—perhaps a parameter like out_of_bounds="clip"—within the schema enforcement logic. When enabled, any value exceeding the maximum or minimum of the target numeric type would be clipped to that type's limit.
Example Scenario
Target Dtype: uint8 (Range: 0 to 255)
Input Value: 260
Expected Result (with clipping): 255
Describe alternatives you've considered
The current alternative is to manually call .clip() on the DataFrame before validation, but this duplicates logic that could be handled more efficiently during the schema enforcement/casting phase.